New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: copy() from C.malloc-ed buffer twice as slow as copying from native go buffer on Linux #49618
Comments
I do not see this on my Linux laptop running Debian rodete.
It might be worth trying to express this as a benchmark in a Go test to see if the issue can be recreated that way. |
Would it help if I add the assembly from gdb for main.go for the above 2 cases? |
Sure, it might help. Thanks. |
I can reproduce on my desktop, although it is more like 50% slower, not 2x slower. I suspect the cause is hugepages. Try allocating the C buffer using Slow code profile is all here:
Fast code profile is all here:
Note that the former is read-limited (reading from the C buffer), whereas the latter is write limited. |
Indeed, bumping the allocation up to 2MB (the size of an x86 huge page) erases the performance delta. |
@randall77 Can you please post the code change you made so I can try it? Thanks. |
Changed the allocated buffer size to 2MB. It made things take much longer, but the problem remains.
|
Making the buffer bigger was:
Getting a profile was
You can then look at the profile with
|
Thanks for the pprof info Note that I'm running the test on a VM
|
I still think this is going to be something different between malloc and our allocator with respect to huge pages. Another thing to check is print the pointer values in both cases. Possibly there's an alignment issue that makes the copy slower. It almost certainly isn't a different amount of actual copying. It's just the copying that is happening is slower. |
Allocating a 8K aligned buffer did not make a difference - takes the same time as malloc-ed buffer
|
Looks like the vm has hugpage support, but currently there are no hugepages allocated
|
Using mmap did not make a difference either
|
Is it possible to pass a native go buffer allocated with make() to a C function & have that function copy data into it? |
Disabling hugepages didn't help
|
Try allocating a 2MB-aligned and sized buffer. That's what the Go runtime does. |
2MB aligned & 2MB sized buffer did not help Here's something interesting. With a 32KB buffer, copy performance can be bad even when copying from one native buffer to another. So the C side can be taken out of this. In the first 2 runs below, buffer pointers are the same, but one took much longer than the other.
|
Take this back. Even with better alignment, performance can be bad.
|
native buf copy code package main
import (
"flag"
"fmt"
"os"
"runtime/pprof"
)
func main() {
buf_size_kib_ptr := flag.Int("buf_size", 1024, "buf size in KiB")
loop_count_ptr := flag.Int("count", 100000, "number of times to copy to go buf")
profile_ptr := flag.Bool("profile", false, "profile the copy loop")
flag.Parse()
buf_size_kib := *buf_size_kib_ptr
loop_count := *loop_count_ptr
profile := *profile_ptr
buf_size_go := buf_size_kib * 1024
out_buf := make([]byte, buf_size_go)
go_buf := make([]byte, buf_size_go)
fmt.Println("buf_size: ", buf_size_kib, "KiB;", buf_size_go, "bytes")
fmt.Printf("in_buf: %p out_buf: %p\n", go_buf, out_buf)
fmt.Println("copying: count=", loop_count)
if profile {
f, _ := os.Create("cpu.pprof")
pprof.StartCPUProfile(f)
defer pprof.StopCPUProfile()
}
for i := 0; i < loop_count; i++ {
copy(out_buf, go_buf)
}
} |
I'm going to close this issue, as I don't think it is a bug in Go. |
Sorry, your response is disappointing |
I'm curious, but I've got a lot of stuff to do and a limited time budget. |
No problem. Thanks for your time. I'm new to go, so it was at least a learning experience. If I find something I'll update this. Let me ask you one more question. |
Typically slice backing arrays are only replaced when growing capacity. That's not language enforced though, just normal practice. A programmer could reassign the backing array at any time. |
Thank you. For my use case (go buffer passed to C is just a plain byte array that does not have any kind, C or Go, of pointers in it), looks like it's ok to pass the go buffer pointer to C & avoid C.malloc. |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
What did you expect to see?
Expected time taken to copy from C.malloc-ed buffer be the same as copying from native go buffer
What did you see instead?
Copying from C.malloc-ed buffer can be more than twice as slow as copying. from native go buffer
Note that macOS does not seem to have this issue
It appears to be specific to Linux
The text was updated successfully, but these errors were encountered: