cmd/compile: self assigned append should not escape #51462

dsnet · 2022-03-03T20:02:07Z

Using go1.17.5.

bb := &Buffer{buf: make([]byte, 0, 64)}
bb.buf = append(bb.buf)

the buffer for make([]byte, 0, 64) escapes to the heap because escape analysis cannot seem to determine that bb.buf = append(bb.buf) does not cause buf to escape.

I expect the code above to be able to stack allocate make([]byte, 0, 64).

\cc @josharian @mdempsky

@josharian theorizes that we might need a special case in https://github.com/golang/go/blame/master/src/cmd/compile/internal/escape/utils.go

The text was updated successfully, but these errors were encountered:

randall77 · 2022-03-03T22:00:04Z

Dup of #27772 ?

dsnet · 2022-03-03T22:04:34Z

Dup of #27772 ?

Possibly. #27772 seems like a more general issue, while this issue targets a specific case that seems more tenable to fixing in the short term.

beoran · 2022-03-05T13:12:02Z

In this case, the append does nothing, though. I feel that this is a case of an incorrect use of append, so that doesn't matter much.

josharian · 2022-03-05T14:57:14Z

I believe that it also reproduces with non-trivial uses of append, and that making it non-trivial doesn't alter the escape analysis. This showed up in practice in https://go-review.googlesource.com/c/go/+/349994.

Jorropo · 2022-03-08T00:28:58Z

In this case, the append does nothing, though. I feel that this is a case of an incorrect use of append, so that doesn't matter much.

I sometime see things like this:

// assume b, c and d sizes are known and small enough
a := make([]byte, len(b)+len(c)+len(d))[:0]
a = append(a, b...)
a = append(a, c...)
a = append(a, d...)

It is mesurabely faster than initialising like this:

var a []byte // = nil

While being more readable and less error prone than a copy based version (even tho copy would be somewhat faster here).

gopherbot · 2022-03-09T18:43:26Z

Change https://go.dev/cl/349994 mentions this issue: bytes: rely on runtime.growslice for growing

Rather than naively making a slice of capacity 2*c+n, rely on the append(..., make(...)) pattern to allocate a slice that aligns up to the closest size class. Performance: name old time/op new time/op delta BufferWriteBlock/N4096 3.03µs ± 6% 2.04µs ± 6% -32.60% (p=0.000 n=10+10) BufferWriteBlock/N65536 47.8µs ± 6% 28.1µs ± 2% -41.32% (p=0.000 n=9+8) BufferWriteBlock/N1048576 844µs ± 7% 510µs ± 5% -39.59% (p=0.000 n=8+9) name old alloc/op new alloc/op delta BufferWriteBlock/N4096 12.3kB ± 0% 7.2kB ± 0% -41.67% (p=0.000 n=10+10) BufferWriteBlock/N65536 258kB ± 0% 130kB ± 0% -49.60% (p=0.000 n=10+10) BufferWriteBlock/N1048576 4.19MB ± 0% 2.10MB ± 0% -49.98% (p=0.000 n=10+8) name old allocs/op new allocs/op delta BufferWriteBlock/N4096 3.00 ± 0% 3.00 ± 0% ~ (all equal) BufferWriteBlock/N65536 7.00 ± 0% 7.00 ± 0% ~ (all equal) BufferWriteBlock/N1048576 11.0 ± 0% 11.0 ± 0% ~ (all equal) The performance is faster since the growth rate is capped at 2x, while previously it could grow by amounts potentially much greater than 2x, leading to significant amounts of memory waste and extra copying. Credit goes to Martin Möhrmann for suggesting the append(b, make([]T, n)...) pattern. Fixes #42984 Updates #51462 Change-Id: I7b23f75dddbf53f8b8b93485bb1a1fff9649b96b Reviewed-on: https://go-review.googlesource.com/c/go/+/349994 Trust: Joseph Tsai <joetsai@digital-static.net> Trust: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Bryan Mills <bcmills@google.com> Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>

quasilyte · 2022-03-20T13:20:02Z

Leaving it here for the completeness https://go-review.googlesource.com/c/go/+/370578

gopherbot · 2023-02-20T02:15:37Z

Change https://go.dev/cl/469556 mentions this issue: encoding/json: unify encodeState.string and encodeState.stringBytes

This is part of the effort to reduce direct reliance on bytes.Buffer so that we can use a buffer with better pooling characteristics. Unify these two methods as a single version that uses generics to reduce duplicated logic. Unfortunately, we lack a generic version of utf8.DecodeRune (see #56948), so we cast []byte to string. The []byte variant is slightly slower for multi-byte unicode since casting results in a stack-allocated copy operation. Fortunately, this code path is used only for TextMarshalers. We can also delete TestStringBytes, which exists to ensure that the two duplicate implementations remain in sync. Performance: name old time/op new time/op delta CodeEncoder 399µs ± 2% 409µs ± 2% +2.59% (p=0.000 n=9+9) CodeEncoderError 450µs ± 1% 451µs ± 2% ~ (p=0.684 n=10+10) CodeMarshal 553µs ± 2% 562µs ± 3% ~ (p=0.075 n=10+10) CodeMarshalError 733µs ± 3% 737µs ± 2% ~ (p=0.400 n=9+10) EncodeMarshaler 24.9ns ±12% 24.1ns ±13% ~ (p=0.190 n=10+10) EncoderEncode 12.3ns ± 3% 14.7ns ±20% ~ (p=0.315 n=8+10) name old speed new speed delta CodeEncoder 4.87GB/s ± 2% 4.74GB/s ± 2% -2.53% (p=0.000 n=9+9) CodeEncoderError 4.31GB/s ± 1% 4.30GB/s ± 2% ~ (p=0.684 n=10+10) CodeMarshal 3.51GB/s ± 2% 3.46GB/s ± 3% ~ (p=0.075 n=10+10) CodeMarshalError 2.65GB/s ± 3% 2.63GB/s ± 2% ~ (p=0.400 n=9+10) name old alloc/op new alloc/op delta CodeEncoder 327B ±347% 447B ±232% +36.93% (p=0.034 n=9+10) CodeEncoderError 142B ± 1% 143B ± 0% ~ (p=1.000 n=8+7) CodeMarshal 1.96MB ± 2% 1.96MB ± 2% ~ (p=0.468 n=10+10) CodeMarshalError 2.04MB ± 3% 2.03MB ± 1% ~ (p=0.971 n=10+10) EncodeMarshaler 4.00B ± 0% 4.00B ± 0% ~ (all equal) EncoderEncode 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta CodeEncoder 0.00 0.00 ~ (all equal) CodeEncoderError 4.00 ± 0% 4.00 ± 0% ~ (all equal) CodeMarshal 1.00 ± 0% 1.00 ± 0% ~ (all equal) CodeMarshalError 6.00 ± 0% 6.00 ± 0% ~ (all equal) EncodeMarshaler 1.00 ± 0% 1.00 ± 0% ~ (all equal) EncoderEncode 0.00 0.00 ~ (all equal) There is a very slight performance degradation for CodeEncoder due to an increase in allocation sizes. However, the number of allocations did not change. This is likely due to remote effects of the growth rate differences between bytes.Buffer and the builtin append function. We shouldn't overly rely on the growth rate of bytes.Buffer anyways since that is subject to possibly change in #51462. As the benchtime increases, the alloc/op goes down indicating that the amortized memory cost is fixed. Updates #27735 Change-Id: Ie35e480e292fe082d7986e0a4d81212c1d4202b3 Reviewed-on: https://go-review.googlesource.com/c/go/+/469556 Run-TryBot: Joseph Tsai <joetsai@digital-static.net> Reviewed-by: Bryan Mills <bcmills@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Auto-Submit: Joseph Tsai <joetsai@digital-static.net>

dsnet added the Performance label Mar 3, 2022

cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 7, 2022

cagedmantis added this to the Backlog milestone Mar 7, 2022

gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile: self assigned append should not escape #51462

cmd/compile: self assigned append should not escape #51462

dsnet commented Mar 3, 2022 •

edited

randall77 commented Mar 3, 2022

dsnet commented Mar 3, 2022

beoran commented Mar 5, 2022

josharian commented Mar 5, 2022

Jorropo commented Mar 8, 2022 •

edited

gopherbot commented Mar 9, 2022

quasilyte commented Mar 20, 2022

gopherbot commented Feb 20, 2023

cmd/compile: self assigned append should not escape #51462

cmd/compile: self assigned append should not escape #51462

Comments

dsnet commented Mar 3, 2022 • edited

randall77 commented Mar 3, 2022

dsnet commented Mar 3, 2022

beoran commented Mar 5, 2022

josharian commented Mar 5, 2022

Jorropo commented Mar 8, 2022 • edited

gopherbot commented Mar 9, 2022

quasilyte commented Mar 20, 2022

gopherbot commented Feb 20, 2023

dsnet commented Mar 3, 2022 •

edited

Jorropo commented Mar 8, 2022 •

edited