runtime: Append consecutive slices, same underlying array #17530

npat-efault · 2016-10-20T17:09:46Z

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (`go version`)?

go version go1.7.3 linux/amd64

What operating system and processor architecture are you using (`go env`)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/npat"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build296609692=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"

What did you do?

It seems that the built-in append does not check for a situation like the following (appending consecutive slices, with the same underlying array) , and needlessly copies data, when it could avoid it.

  func appendBytes(a, b []byte) []byte {
          if len(b) == 0 {
                  return a
          }
          // Shouldn't append() check for something like this?
          if cap(a) >= len(a)+len(b) && &a[:len(a)+1][len(a)] == &b[0] {
                  return a[:len(a)+len(b)]
          }
          return append(a, b...)
  }

If I'm following the breadcrumbs correctly, they lead to the fact that runtime-memmove() does not short-circuit when source and destination addresses are the same (I may be wrong, though).

Running the simple benchmarks at:

https://gist.github.com/npat-efault/f055654633f43d0ef771d38657fd499e

for any value of N, demonstrates this.

What did you expect to see?

Built-in append detecting the trivial case and behaving (performance-wise) like appendBytes above, running time being independent of slice b size

What did you see instead?

Running time increases with slice b size.

The text was updated successfully, but these errors were encountered:

cznic · 2016-10-20T17:20:08Z

I think this is a rarely occurring optimization opportunity. That said, the cost for its detection, paid in all other, more often seen cases, is possibly high.

Any real world examples which would be helped by this?

npat-efault · 2016-10-20T17:44:15Z

I believe that the cost of runtime-memmove() detecting that source and destination addresses are the same, is minimal (one or two instructions). Almost barely noticeable and only for extremely small slices.

Then again, you are right, it is rarely occurring.

cznic · 2016-10-20T17:56:27Z

I doubt cap(a) >= len(a)+len(b) && &a[:len(a)+1][len(a)] == &b[0] can compile to one or two instructions.

randall77 · 2016-10-20T18:00:11Z

I agree that this seems super rare and not worth the code to check for it.
If you know you're in this situation (or might be), you can check yourself and reslice to avoid the copy.
Closing.

npat-efault · 2016-10-20T18:11:00Z

@cznic: I don't think you have to do exactly this. This is purely for demonstration purposes (the best I could do with "user code"). All you'd have to do is, in fact, add a CMPQ DI, SI; JEQ move_0 (or something similar, I'm not familiar with the go assembler) to the runtime-memmove() implementation.

If I'm not mistaken.

randall77 · 2016-10-20T18:50:53Z

Right, memmove could do a pretty simple check. But we're not going to add that absent some real code which cares.

npat-efault · 2016-10-20T19:08:33Z

I was thinking of code along the lines of this:

    parsed := src[0:0]
    for len(src) > 0 {
        n := keep(src)
        parsed = append(parsed, src[:n]...)
        src = src[n:]
        n = drop(src)
        src = src[n:]
    }

Arguably it may not be the best way to code something like this, but it is not totally unnatural. In this case, if drop() returns 0, you should not have to pay for the extra copy...

Anyway, as you said, it can be avoided.

I just thought I'd point-out the issue (and the fact that it can easily be fixed). Whether something should be done about it, is a judgement call, and is up to you...

npat-efault · 2016-10-20T19:48:28Z

Fixed a couple of typos in the example code of my previous comment

nigeltao · 2016-10-20T22:55:16Z

You could write it like this:

// r and w are read and write cursors for src.
r, w := 0, 0
for r < len(src) {
    n := keep(src[r:])
    if r != w {
        copy(src[w:], src[r:r+n])
    }
    r, w = r+n, w+n
    r += drop(src[r:])
}
parsed := src[:w]

npat-efault · 2016-10-20T23:18:40Z

@nigeltao Yes, of course I could. And there are probably other ways, as well. The thing is, the code I posted was the first thing that came to my mind (instinctively). And immediately I though: "I guess append() will be clever enough to avoid the copy and just do a re-slice (extend)". I checked, and it turns out it isn't. Which was a bit surprising. Especially considering that the cost of detecting the (uncommon) special case turns out to be negligible. I think (and this is a judgement call) that doing "the right thing" in this case justifies the (almost negligible) cost.

npat-efault · 2016-10-20T23:28:12Z

@nigeltao Also, in my original case, parsed was passed as an argument, and it could, or could not, "coincide" with src. Yes you could also handle this with some extra code, but this is besides the point... (it's the "asymmetry" that matters)

sprstnd · 2016-10-23T21:33:22Z

so redundant compiler/runtime checks are fine, but this extra check is a concern? poppycock!

minux · 2016-10-25T18:13:06Z

Whether the cost is negligible or not is the deciding factor
for adding a special case.

If the case only happens one in a million times, then no
matter how cheap the optimization is, we shouldn't add
it.

And the optimization cost is non-negligible anyway as
it will at least cost one more entry in the branch predictor
tables.

That being said, I instrumented the memmove routine
and found that runtime.typedmemmove actually does
trigger runtime.memmove with src == dest. Perhaps it's
worthwhile to investigate whether we can eliminate such
unnecessary memmove calls.

minux · 2016-10-25T18:25:34Z

self memmove is not rare during all.bash, but the count is almost never larger than 0x20 bytes. Most of the cases come from math/big, where the API always takes a result parameter, and in a lot of cases, the final step is to copy the result into the result parameter. We will need a more extensive benchmark analysis to see whether memmove should special case this.

npat-efault · 2016-10-25T19:31:18Z

Ah, I see! The same memmove is used in many cases, other than slice-copy, slice-append, and the likes. A lot of them may be very small-size moves, where the cost of adding the check could be considerable (or maybe not). We could consider adding the check only for slice appends (which is more complicated that adding a simple compare to memmove). In this case, though, it would mean adding a couple of extra instructions at every append call site... not an obvious gain.

npat-efault · 2016-10-25T19:52:24Z

What if we add the test to memmove for all but the smallest sizes? Just a thought.

randall77 · 2016-10-25T21:54:17Z

Adding the test above a size threshold would certainly bound the overhead
of the check.

On Tue, Oct 25, 2016 at 12:53 PM, Nick Patavalis notifications@github.com
wrote:

What if we add the test to memmove for all but the smallest sizes? Just a
thought.

—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
#17530 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AGkgIFHP3HmXrnZ-X2DxuPUP4y6Y0pLCks5q3l4kgaJpZM4KcYDI
.

randall77 closed this as completed Oct 20, 2016

mikioh changed the title ~~Append consecutive slices, same underlying array~~ runtime: Append consecutive slices, same underlying array Oct 22, 2016

golang locked and limited conversation to collaborators Oct 25, 2017

gopherbot added the FrozenDueToAge label Oct 25, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: Append consecutive slices, same underlying array #17530

runtime: Append consecutive slices, same underlying array #17530

npat-efault commented Oct 20, 2016

cznic commented Oct 20, 2016

npat-efault commented Oct 20, 2016

cznic commented Oct 20, 2016

randall77 commented Oct 20, 2016

npat-efault commented Oct 20, 2016 •

edited

randall77 commented Oct 20, 2016

npat-efault commented Oct 20, 2016 •

edited

npat-efault commented Oct 20, 2016 •

edited

nigeltao commented Oct 20, 2016 •

edited

npat-efault commented Oct 20, 2016

npat-efault commented Oct 20, 2016 •

edited

sprstnd commented Oct 23, 2016 •

edited

minux commented Oct 25, 2016

minux commented Oct 25, 2016 via email

npat-efault commented Oct 25, 2016 •

edited

npat-efault commented Oct 25, 2016

randall77 commented Oct 25, 2016

runtime: Append consecutive slices, same underlying array #17530

runtime: Append consecutive slices, same underlying array #17530

Comments

npat-efault commented Oct 20, 2016

What version of Go are you using (go version)?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

cznic commented Oct 20, 2016

npat-efault commented Oct 20, 2016

cznic commented Oct 20, 2016

randall77 commented Oct 20, 2016

npat-efault commented Oct 20, 2016 • edited

randall77 commented Oct 20, 2016

npat-efault commented Oct 20, 2016 • edited

npat-efault commented Oct 20, 2016 • edited

nigeltao commented Oct 20, 2016 • edited

npat-efault commented Oct 20, 2016

npat-efault commented Oct 20, 2016 • edited

sprstnd commented Oct 23, 2016 • edited

minux commented Oct 25, 2016

minux commented Oct 25, 2016 via email

npat-efault commented Oct 25, 2016 • edited

npat-efault commented Oct 25, 2016

randall77 commented Oct 25, 2016

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?

npat-efault commented Oct 20, 2016 •

edited

npat-efault commented Oct 20, 2016 •

edited

npat-efault commented Oct 20, 2016 •

edited

nigeltao commented Oct 20, 2016 •

edited

npat-efault commented Oct 20, 2016 •

edited

sprstnd commented Oct 23, 2016 •

edited

npat-efault commented Oct 25, 2016 •

edited