cmd/compile: Swapping elements of a `[2]any` uses 2 separate writebarriers #62126

dans-stuff · 2023-08-18T02:14:52Z

What version of Go are you using (`go version`)?

$ go version
go version go1.20.1 darwin/amd64

Does this issue reproduce with the latest release?

Uncertain

What operating system and processor architecture are you using (`go env`)?

go env Output

$ go env
GOARCH="amd64"
GOOS="darwin"

What did you do?

I was profiling some code and found that swapping 2 elements of a slice was slower than expected. I found that swapping elements if they were pointer types instead of interfaces reached the performance I expected.

This link shows the naive method of swapping elements, and the unsafe method which produces the asm and performance I expected: https://go.dev/play/p/NQZn03hsQU4

Here are the benchmarks:

BenchmarkSwapIntPtrs-8          842213125                1.422 ns/op
BenchmarkSwapAny-8              749056459                1.575 ns/op
BenchmarkSwapAnyAsUintptrs-8    812162583                1.471 ns/op

My assumption is that the compiler should get the performance of the 3rd benchmark in which I use unsafe to force my expected asm to be generated, but is represented here as the 2nd benchmark.

What did you expect to see?

For the code

arr[0], arr[1] = arr[1], arr[0]

where arr is an array of the empty interface, I expected to see just one write barrier check, just as I do for the (paraphrased) asm for swapping a normal pointer type.

MOVQ the interfaces into registers
MOVQ both interface *type* parts into their new positions in the array
CMPL runtime.writeBarrier
MOVQ both interface *value* parts into their new positions in the array

What did you see instead?

I unexpectedly found two writebarrier checks, where the asm was writing the values of each interface with separate writebarrier checks, like this (paraphrased) asm.

MOVQ the interfaces into registers
MOVQ both interface *type* parts into their new positions in the slice
CMPL runtime.writeBarrier
MOVQ one interface *value* into its new position in the slice
CMPL runtime.writeBarrier
MOVQ the other interface *value* into its new position in the slice

By using unsafe to treat the [2]any as a [4]uintptr and doing 2 sets of swaps, I was able to observe the asm doing just 1 writebarrier check. I currently believe that one check has the same semantics as two checks (in regards to concurrent gc sweep correctness), and this might be incorrect. I'm almost certain however it has the same semantics in regards to concurrent observability promises.

The text was updated successfully, but these errors were encountered:

dans-stuff · 2023-08-18T03:31:20Z

It appears my question is much simpler: why do consecutive assignments to interfaces require multiple writebarrier checks, and why is this not required for consecutive assignments to pointer types? I'm not aware of the internals of gc, but this asymmetry strikes me as an unnecessary cost.

randall77 · 2023-08-18T04:42:56Z

The difference you're seeing is that when writing two pointers in a row, we can merge those two pointer writes under a single barrier check. When you're writing interfaces, or strings, the writes are pointer-scalar-pointer-scalar, so we need two write barrier checks because the intervening scalar write prevents merging the barrier checks.

Note that for 1.21 the generated write barrier code has changed significantly. I think the behavior you observed is still present, but may have different performance characteristics. The new barriers also may make this easier to fix because the new barriers can be batched more easily. In particular, it should be easier to just plain ignore intervening scalar writes.

dans-stuff · 2023-08-18T04:45:38Z

That pointer-scalar split is what I was observing - though oddly, it reorders the scalars to be first, then the pointers to be next, and still splits them. I will try 1.21 to see if it is fixed.

Very helpful response though, thank you. It may be that in practice, the write barrier is negligible enough that it doesn't matter to perform multiple times.

randall77 · 2023-08-18T04:47:49Z

it reorders the scalars to be first

Yes, that's so the register holding the scalar is used first, so if it dies at that point it doesn't have to be saved around the write barrier call.
(Although that's an old justification, I'm not sure it matters in the last few versions of our barrier.)

randall77 · 2023-08-18T04:49:29Z

Here's a simple example for anyone looking at this:

type S struct {
	a *int
	b int
	c *int
}

func g(p *S) {
	p.a = nil
	p.b = 0
	p.c = nil
}

It would be nice if that compiled to a single barrier check + a gcWriteBarrier2 call, instead of two checks and two gcWriteBarrier1 calls.

gopherbot · 2023-08-21T19:33:33Z

Change https://go.dev/cl/521498 mentions this issue: cmd/compile: allow non-pointer writes in the middle of a write barrier

This lets us combine more write barriers, getting rid of some of the test+branch and gcWriteBarrier* calls. With the new write barriers, it's easy to add a few non-pointer writes to the set of values written. We allow up to 2 non-pointer writes between pointer writes. This is enough for, for example, adjacent slice fields. Fixes golang#62126 Change-Id: I872d0fa9cc4eb855e270ffc0223b39fde1723c4b Reviewed-on: https://go-review.googlesource.com/c/go/+/521498 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com>

gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Aug 18, 2023

randall77 added the Performance label Aug 18, 2023

randall77 added this to the Unplanned milestone Aug 18, 2023

randall77 added the help wanted label Aug 18, 2023

dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Aug 18, 2023

gopherbot closed this as completed in 556e9c5 Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile: Swapping elements of a `[2]any` uses 2 separate writebarriers #62126

cmd/compile: Swapping elements of a `[2]any` uses 2 separate writebarriers #62126

dans-stuff commented Aug 18, 2023 •

edited

dans-stuff commented Aug 18, 2023

randall77 commented Aug 18, 2023

dans-stuff commented Aug 18, 2023

randall77 commented Aug 18, 2023

randall77 commented Aug 18, 2023

gopherbot commented Aug 21, 2023

cmd/compile: Swapping elements of a [2]any uses 2 separate writebarriers #62126

cmd/compile: Swapping elements of a [2]any uses 2 separate writebarriers #62126

Comments

dans-stuff commented Aug 18, 2023 • edited

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

dans-stuff commented Aug 18, 2023

randall77 commented Aug 18, 2023

dans-stuff commented Aug 18, 2023

randall77 commented Aug 18, 2023

randall77 commented Aug 18, 2023

gopherbot commented Aug 21, 2023

cmd/compile: Swapping elements of a `[2]any` uses 2 separate writebarriers #62126

cmd/compile: Swapping elements of a `[2]any` uses 2 separate writebarriers #62126

dans-stuff commented Aug 18, 2023 •

edited

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?