bytes: bytes.Clone(), using copy is faster than append. #55905

imkos · 2022-09-28T07:02:49Z

// default
func clone(b []byte) []byte {
	if b == nil {
		return nil
	}
	return append([]byte{}, b...)
}

// fast
func cloneV2(b []byte) []byte {
	if b == nil {
		return nil
	}
	nb := make([]byte, len(b))
	copy(nb, b)
	return nb
}

// normal
func cloneV3(b []byte) []byte {
	if b == nil {
		return nil
	}
	nb := make([]byte, 0, len(b))
	return append(nb, b...)
}

var toTestB = []byte("JFIOSJFOISJFOIAJFKSDFSDlkadjfskfdskdfj31231111111111111111111111111111111111111111111")

func Benchmark_clone(b *testing.B) {
	for i := 0; i < b.N; i++ {
		clone(toTestB)
	}
}

func Benchmark_cloneV2(b *testing.B) {
	for i := 0; i < b.N; i++ {
		cloneV2(toTestB)
	}
}

func Benchmark_cloneV3(b *testing.B) {
	for i := 0; i < b.N; i++ {
		cloneV3(toTestB)
	}
}

Benchmark_clone
Benchmark_clone-6 35159740 74.59 ns/op 96 B/op 1 allocs/op

Benchmark_cloneV2
Benchmark_cloneV2-6 43229692 59.33 ns/op 96 B/op 1 allocs/op

Benchmark_cloneV3
Benchmark_cloneV3-6 39171712 71.48 ns/op 96 B/op 1 allocs/op

dmitshur · 2022-09-28T13:35:14Z

CC @icholy, @martisch, @ianlancetaylor.

dsnet · 2022-09-28T18:40:43Z

The title seems to suggest that bytes should use copy rather than append. Perhaps we can view this instead as figuring out why append is slower than copy in this case and optimize that instead?

martisch · 2022-09-28T20:17:47Z

Generally append needs to find what the smallest allocation size class is that fits the slize and then may allocate a larger slice with more capacity then the original and then may need to zero additional space. So it will often do more work than make+copy with same length as it provides an extra bonus capacity that would otherwise be unused memory.

In addition make+copy is optimized by the compiler: https://go-review.git.corp.google.com/c/go/+/146719
Such a fast path for append that would recognise append([]byte{}, b...) does not exist yet.

While we can make append([]byte{}, b...) especially for bytes faster in general it can not be as fast as make+copy.
The decision to prefer append to have additional capacity (which is slower initially) has already been made for exp/slices.Clone and was copied to bytes.Clone.

Same issue just for slices.Clone: #53643

go101 · 2022-09-29T02:46:28Z

"make+copy" will also find the smallest allocation size class. So the actual reason why "append" is slower is it will zero additional space.

Now the "make+copy" optimization has many restrictions. Calling make with 3 arguments will not trigger the optimization.

martisch · 2022-09-29T06:15:32Z

For "make+copy" and append the allocator will find the smallest allocation size class but in addition "append" does it explicitly before calling the allocator because the allocator has no mode to tell allocate as much as possible elements that fit this type for a slice and report back how much was allocated.

"make+copy" just uses the length as is to compute the value for mallocgc:

go/src/runtime/slice.go

Line 51 in 223a563

tomem = et.size * uintptr(tolen)

"append" first calculates the size class explicitly and the length that fits in it before calling mallocgc:

go/src/runtime/slice.go

Line 213 in 223a563

capmem = roundupsize(uintptr(newcap))

Using roundupsize the "append" searches explicitly for the size class before using the allocator which "make+copy" does not.

In general growslice used for append is less specialized than "make+copy" (in the general use where its optimized) and computes more parameters and has to take care of different cases (no additional capacity or additional capacity, exisiting append to slice fits added items) that "make+copy" does not. This is both because the compiler specialises it more to different cases as well as it generally does need todo less.

If this is important enough and does not degrade performance in the general case the following can be evaluated:

specialize append more by having the compiler select a specialized implementation due to constraints on length/capacity parameters (e.g. append to a know 0 length slice)
specialize append more by having the compiler select a specialized implementation decided at compile time by type (e.g. append for 0 or 1 byte elements, elements without or with pointers) such that the runtime overhead inside append gets reduced
add a mode of mallocgc such that it can decide to fill a size class and report the length back (I expect this not to make a great difference and not be a net win as it will add overhead to mallocgc calls that dont need this feature, I also has only one use case so far and thats growslice)

On the high level I think we first need consensus if the issue is to be:

use make+copy in Clone as it will always be faster if same amount of optimization is applied and give up the advantage of extra capacity of append. But then exp/slices.Clone should do the same. cmd/compile: suboptimal cloning/optimization in slices.Clone #53643 seems to suggest we wont do that but should if at all optimize more in the compiler.
a general call to optimize append more (see above) so we do not need to keep open one issue per instance where the difference in performance is observed.

I would think we can deduplicate to the issue #53643 and make a comment there this would also benefit bytes.Clone.

imkos · 2022-09-29T08:19:14Z

My idea is that since the internal implementation of Clone(), uses append, it should be optimized as much as possible, because this function will be called a lot.

dmitshur added Performance NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Sep 28, 2022

dmitshur added this to the Backlog milestone Sep 28, 2022

mvdan mentioned this issue Jan 12, 2023

cmd/compile: append should be optmized as copy #57759

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bytes: bytes.Clone(), using copy is faster than append. #55905

bytes: bytes.Clone(), using copy is faster than append. #55905

imkos commented Sep 28, 2022 •

edited

dmitshur commented Sep 28, 2022

dsnet commented Sep 28, 2022

martisch commented Sep 28, 2022 •

edited

go101 commented Sep 29, 2022

martisch commented Sep 29, 2022 •

edited

imkos commented Sep 29, 2022

bytes: bytes.Clone(), using copy is faster than append. #55905

bytes: bytes.Clone(), using copy is faster than append. #55905

Comments

imkos commented Sep 28, 2022 • edited

dmitshur commented Sep 28, 2022

dsnet commented Sep 28, 2022

martisch commented Sep 28, 2022 • edited

go101 commented Sep 29, 2022

martisch commented Sep 29, 2022 • edited

imkos commented Sep 29, 2022

imkos commented Sep 28, 2022 •

edited

martisch commented Sep 28, 2022 •

edited

martisch commented Sep 29, 2022 •

edited