New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: miscompilation of partially-overlapping array assignments #54467
Comments
Yep, I think Speaking of which, |
arch == amd64 seems ok! |
Not ok on my amd64. |
@cuiweixie Why do you say amd64 seems ok? The program I linked to above should run and exit without printing anything; but on the Go playground, which uses amd64, it prints that |
this code is ok in amd64, but not arm64! |
Change https://go.dev/cl/425076 mentions this issue: |
@gopherbot please open backport issues for 1.18 and 1.17. |
Backport issue(s) opened: #54603 (for 1.18). Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases. |
@gopherbot please open a backport issue for 1.19. (My off-by-one error: 1.17 is out of maintenance, so we don't need that backport.) |
The fix for the above issue on ARM64 appears to be treating the following code as a partially overlapping assignment and therefore calling runtime.memmove. The net result is an 80% performance penalty on benchmarks. Is this correct behavior, or has this simple use case been incorrectly tagged for this safer/slower approach? type Array [2]int
func moveSliceOfArrayData(s []Array) []Array {
for i := 1; i < len(s); i++ {
s[i-1], s[i] = s[i], s[i-1]
}
return s
} |
Is it good to unify the internal implementation of this and |
Change https://go.dev/cl/433056 mentions this issue: |
Partial overlaps can only happen for strict sub-pieces of larger arrays. That's a much stronger condition than the current optimization rules. Update #54467 Change-Id: I11e539b71099e50175f37ee78fddf69283f83ee5 Reviewed-on: https://go-review.googlesource.com/c/go/+/433056 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
@W8CYE The performance regression you were seeing has now been fixed on tip. |
@randall77 My ARM64 benchmarks have not returned to the 1.18.5 or 1.19.0 level. The tests in
|
It fixes the code in #54467 (comment) for me.
True. The bug is arch-agnostic, just with slightly different constants. It only really needs a test on one arch.
That would be fine, or just post it here. |
I have run the benchmarks on AMD64 and ARM64. What I see is the AMD64 benchmarks are consistent from 1.18 -> TIP, but ARM64 shows significant spikes for 1.18.6, 1.19.1, and TIP. go<version> version
GOMAXPROCS=1 go<vesion> test -bench=BenchmarkSliceOfArray | grep BenchmarkSliceOfArray darwin/amd64
darwin/arm64
datamove.go package datamove
type Array [2]int
func moveSliceOfArrayData(s []Array) []Array {
for i := 1; i < len(s); i++ {
s[i-1], s[i] = s[i], s[i-1]
}
return s
} datamove_test.go package datamove
import (
"testing"
)
var resultArray []Array
func BenchmarkSliceOfArray(b *testing.B) {
var r []Array
s := make([]Array, 1000)
for n := 0; n < b.N; n++ {
r = moveSliceOfArrayData(s)
}
resultArray = r
} |
Hmm, yes I am seeing the performance changes you are. The memmove problem has been fixed, but there are other differences in the assembly between 1.19 and tip. If anything, the tip assembly looks better to me. But who am I to argue with the chip as to how fast assembly is? I'm going to open a new issue, this is getting complicated. |
Is there a reason why such assignments and the built-in package main
import "testing"
const N = 10000
var x = make([]byte, N)
func Benchmark_copy(b *testing.B) {
for i := 0; i < b.N; i++ {
copy(x, x)
}
}
func Benchmark_assign(b *testing.B) {
for i := 0; i < b.N; i++ {
*(*[N]byte)(x) = *(*[N]byte)(x)
}
}
|
Not sure. |
It is not compiled away for all such cases: package main
import "testing"
const N = 10000
var x = make([]byte, N+N)
var y = x[:N]
//go:noinline
func Copy(x, y []byte) {
copy(x, y)
}
//go:noinline
func Copy2(x, y []byte) {
*(*[N]byte)(x) = *(*[N]byte)(y)
}
func Benchmark_copy(b *testing.B) {
for i := 0; i < b.N; i++ {
Copy(x, y)
}
}
func Benchmark_assign(b *testing.B) {
for i := 0; i < b.N; i++ {
Copy2(x, y)
}
} I think the built-in |
That could be the difference. |
Yes, then it is similar. |
https://go.dev/play/p/QL9pJw6LY9e
When
p
andq
are pointers to arrays, the behavior of*p = *q
should be as if*q
is fully evaluated before any assignments through*p
take effect.The above program however demonstrates (at least on GOARCH=amd64) that when they're pointers to partially-overlapping arrays, the evaluation of
*q
and assignments through*p
are interleaved./cc @golang/compiler
The text was updated successfully, but these errors were encountered: