Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: treat "d = append(make([]T, 0, len(s)), s...)" as "d = make([]T, len(s)); copy(d, s)" #46079

Open
go101 opened this issue May 10, 2021 · 6 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done. Performance
Milestone

Comments

@go101
Copy link

go101 commented May 10, 2021

What version of Go are you using (go version)?

$ go version
go version go1.16.3 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What did you do?

package main

import (
	"testing"
)

type T = int

var sMakeCopy128 []T
func Benchmark_MakeCopy_128(b *testing.B) {
	s := make([]T, 128)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		sMakeCopy128 = make([]T, len(s))
		copy(sMakeCopy128, s)
	}
}

var sMakeAppend128 []T
func Benchmark_MakeAppend_128(b *testing.B) {
	s := make([]T, 128)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		sMakeAppend128 = append(make([]T, 0, len(s)), s...)
	}
}

var sMakeCopy1024 []T
func Benchmark_MakeCopy_1024(b *testing.B) {
	s := make([]T, 1024)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		sMakeCopy1024 = make([]T, len(s))
		copy(sMakeCopy1024, s)
	}
}

var sMakeAppend1024 []T
func Benchmark_MakeAppend_1024(b *testing.B) {
	s := make([]T, 1024)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		sMakeAppend1024 = append(make([]T, 0, len(s)), s...)
	}
}

var sMakeCopy8192 []T
func Benchmark_MakeCopy_8192(b *testing.B) {
	s := make([]T, 8192)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		sMakeCopy8192 = make([]T, len(s))
		copy(sMakeCopy8192, s)
	}
}

var sMakeAppend8192 []T
func Benchmark_MakeAppend_8192(b *testing.B) {
	s := make([]T, 8192)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		sMakeAppend8192 = append(make([]T, 0, len(s)), s...)
	}
}

var sMakeCopy65536 []T
func Benchmark_MakeCopy_65536(b *testing.B) {
	s := make([]T, 65536)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		sMakeCopy65536 = make([]T, len(s))
		copy(sMakeCopy65536, s)
	}
}

var sMakeAppend65536 []T
func Benchmark_MakeAppend_65536(b *testing.B) {
	s := make([]T, 65536)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		sMakeAppend65536 = append(make([]T, 0, len(s)), s...)
	}
}

What did you expect to see?

No performance difference between the two ways.

What did you see instead?

The larger the length of the slices, the bigger performance difference between the two.

$ go test -bench=. -benchmem -benchtime=3s
goos: linux
goarch: amd64
pkg: a.b/c/ttt
cpu: Intel(R) Core(TM) i5-4210U CPU @ 1.70GHz
Benchmark_MakeCopy_128-4       	14257712	       254.3 ns/op	    1024 B/op	       1 allocs/op
Benchmark_MakeAppend_128-4     	13141005	       268.5 ns/op	    1024 B/op	       1 allocs/op
Benchmark_MakeCopy_1024-4      	 1885544	      1928 ns/op	    8192 B/op	       1 allocs/op
Benchmark_MakeAppend_1024-4    	 1663520	      2369 ns/op	    8192 B/op	       1 allocs/op
Benchmark_MakeCopy_8192-4      	  268628	     14040 ns/op	   65536 B/op	       1 allocs/op
Benchmark_MakeAppend_8192-4    	  196272	     18757 ns/op	   65536 B/op	       1 allocs/op
Benchmark_MakeCopy_65536-4     	   36700	    100680 ns/op	  524291 B/op	       1 allocs/op
Benchmark_MakeAppend_65536-4   	   24723	    136324 ns/op	  524288 B/op	       1 allocs/op

[Edit]: it looks the allocated bytes/op is the not same for the last two benchmarks. Bug or some inaccuracy?

@randall77
Copy link
Contributor

Looks like the difference is that the faster code is able to elide zeroing the allocation.

@randall77 randall77 added this to the Unplanned milestone May 10, 2021
@randall77 randall77 added the NeedsFix The path to resolution is known, but the work has not been done. label May 10, 2021
@martisch
Copy link
Contributor

martisch commented May 10, 2021

I dont think its clear cut we should do this. I would rather prefer that programmers wanting the increased performance write make+copy directly. Otherwise we will go down a path of adding more and more complexity rewriting append patterns in the compiler that are more idiomatic in my opinion to be written as e.g. make+copy in the first place.

@bcmills
Copy link
Contributor

bcmills commented May 10, 2021

Otherwise we will go down a path of adding more and more complexity rewriting append patterns in the compiler that are more idiomatic in my opinion to be written as e.g. make+copy in the first place.

FWIW, I don't think make+copy is particularly idiomatic either — I don't think this sort of pattern occurs frequently enough to have a clear “idiom”.

I will, however, point out that except for the string special-cases, copy can be defined in terms of append (https://go2goplay.golang.org/p/nfU-IUC7l7k):

func copy[S SliceOf[T], T any](dst, src S) int {
	n := len(src)
	if n > len(dst) {
		n = len(dst)
	}
	_ = append(dst[:0], src[:n]...)
	return n
}

In contrast, append cannot be defined in terms of copy, because it has additional allocation behavior that copy lacks — and the allocation behavior is explicitly what the user is concerned with in the examples here.

So it seems to me much more natural for users to reach for append whenever allocation is involved, and copy only when the function doesn't need to allocate in the same step.

At any rate, given that copy can be defined in terms of append, and given that append avoids unnecessary zeroing in many other cases, I think it's strange for a pattern involving append to redundantly zero out data when copy does not.

@martisch
Copy link
Contributor

martisch commented May 10, 2021

and the allocation behavior is explicitly what the user is concerned with in the examples here.

But the allocation in the example here is explicitly not done by append but by the make. The append here is used only for the copy part.

The suggestion here is not to rewrite append to make+copy but to rewrite make+append to make+copy. The append here is used to allow to write it as one expression returning a copy.

Once generics are around a slices.Clone function might ideally take care of a more concise expression to clone a slice in one expression.

Another way to write a slice clone is append([]T{}, s...)[:len(s):len(s)] which I think would also need to be considered to be performance optimized similar to make+append.

@go101
Copy link
Author

go101 commented May 24, 2021

It would be great if the same optimization also works for the following two make calls.

func InsertVerbose(s []int, k int, vs ...int) []int {
	if n := len(s) + len(vs); n <= cap(s) {
		s2 := s[:n]
		copy(s2[k+len(vs):], s[k:])
		copy(s2[k:], vs)
		return s2
	}
	s2 := make([]int, len(s) + len(vs))
	copy(s2, s[:k])
	copy(s2[k:], vs)
	copy(s2[k+len(vs):], s[k:])
	return s2
}


func InsertVerbose_b(s []int, k int, vs ...int) []int {
	if n := len(s) + len(vs); n <= cap(s) {
		s2 := s[:n]
		copy(s2[k+len(vs):], s[k:])
		copy(s2[k:], vs)
		return s2
	}
	s2 := make([]int, 0, len(s) + len(vs))
	s2 = append(s2, s[:k]...)
	s2 = append(s2, vs...)
	s2 = append(s2, s[k:]...)
	return s2
}

[update]:

func MergeSlices(data ...[]int) []int {
	n := 0
	for _, s := range data {
		n += len(s)
	}
	r := make([]int, 0, n)
	for _, s := range data {
		r = append(r, s...)
	}
	return r
}

@go101
Copy link
Author

go101 commented Jul 29, 2021

Not sure whether or not the following make call reset elements:

  s := make([]byte, len(aString) + len(aByteSlice))
  copy(s, aString)
  copy(s[len(aString):], aByteSlice)

[update] After some investigations, if looks the element in s[:len(aString)] are not reset to zero, but others are reset.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done. Performance
Projects
None yet
Development

No branches or pull requests

5 participants