New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: Append consecutive slices, same underlying array #17530
Comments
I think this is a rarely occurring optimization opportunity. That said, the cost for its detection, paid in all other, more often seen cases, is possibly high. Any real world examples which would be helped by this? |
I believe that the cost of Then again, you are right, it is rarely occurring. |
I doubt |
I agree that this seems super rare and not worth the code to check for it. |
@cznic: I don't think you have to do exactly this. This is purely for demonstration purposes (the best I could do with "user code"). All you'd have to do is, in fact, add a If I'm not mistaken. |
Right, memmove could do a pretty simple check. But we're not going to add that absent some real code which cares. |
I was thinking of code along the lines of this:
Arguably it may not be the best way to code something like this, but it is not totally unnatural. In this case, if Anyway, as you said, it can be avoided. I just thought I'd point-out the issue (and the fact that it can easily be fixed). Whether something should be done about it, is a judgement call, and is up to you... |
Fixed a couple of typos in the example code of my previous comment |
You could write it like this:
|
@nigeltao Yes, of course I could. And there are probably other ways, as well. The thing is, the code I posted was the first thing that came to my mind (instinctively). And immediately I though: "I guess |
@nigeltao Also, in my original case, |
so redundant compiler/runtime checks are fine, but this extra check is a concern? poppycock! |
Whether the cost is negligible or not is the deciding factor If the case only happens one in a million times, then no And the optimization cost is non-negligible anyway as That being said, I instrumented the memmove routine |
self memmove is not rare during all.bash, but the count
is almost never larger than 0x20 bytes.
Most of the cases come from math/big, where the API
always takes a result parameter, and in a lot of cases,
the final step is to copy the result into the result parameter.
We will need a more extensive benchmark analysis to
see whether memmove should special case this.
|
Ah, I see! The same memmove is used in many cases, other than slice-copy, slice-append, and the likes. A lot of them may be very small-size moves, where the cost of adding the check could be considerable (or maybe not). We could consider adding the check only for slice appends (which is more complicated that adding a simple compare to memmove). In this case, though, it would mean adding a couple of extra instructions at every append call site... not an obvious gain. |
What if we add the test to memmove for all but the smallest sizes? Just a thought. |
Adding the test above a size threshold would certainly bound the overhead On Tue, Oct 25, 2016 at 12:53 PM, Nick Patavalis notifications@github.com
|
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (
go version
)?go version go1.7.3 linux/amd64
What operating system and processor architecture are you using (
go env
)?GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/npat"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build296609692=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
What did you do?
It seems that the built-in append does not check for a situation like the following (appending consecutive slices, with the same underlying array) , and needlessly copies data, when it could avoid it.
If I'm following the breadcrumbs correctly, they lead to the fact that
runtime-memmove()
does not short-circuit when source and destination addresses are the same (I may be wrong, though).Running the simple benchmarks at:
https://gist.github.com/npat-efault/f055654633f43d0ef771d38657fd499e
for any value of N, demonstrates this.
What did you expect to see?
Built-in append detecting the trivial case and behaving (performance-wise) like
appendBytes
above, running time being independent of sliceb
sizeWhat did you see instead?
Running time increases with slice
b
size.The text was updated successfully, but these errors were encountered: