New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile, runtime: go v1.8 fails to allocate heap bitmap on armv7 when importing zmq4 #19831
Comments
This works for me when compiling your sample program natively on a
raspberry pi. Have you tried building locally on your arm system? It's
probably going to be simpler than maintaining a cross compilation
environment.
…On Tue, Apr 4, 2017 at 10:54 AM, Chris Johnson ***@***.***> wrote:
What version of Go are you using (go version)?
1.8 linux/amd64
go env ($TOOLCHAIN and $SYSROOT are for cross compile)
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
CGO_ENABLED=1
C_INCLUDE_PATH=$SYSROOT/usr/include/
LIBRARY_PATH=$SYSROOT/usr/lib
CC=$TOOLCHAIN
PKG_CONFIG_PATH=$SYSROOT/usr/lib/pkgconfig/
GOARCH=arm
GOARM=7
I'm using go v1.8 to cross compile to an armv7 target and everything works
until I import zeromq, and then not even this works:
package main
import (
zmq "github.com/pebbe/zmq4"
"fmt"
)
func main() {
fmt.Println(zmq.Version())
}
Running it on the target gives the following:
./zmq
fatal error: runtime: out of memory
runtime stack:
runtime.throw(0xc8d3d, 0x16)
/usr/lib/go/src/runtime/panic.go:596 +0x70 fp=0x7eb018a0 sp=0x7eb01894
runtime.sysMap(0x6d752000, 0x74d4000, 0x1, 0x146b60)
/usr/lib/go/src/runtime/mem_linux.go:227 +0xb0 fp=0x7eb018c4 sp=0x7eb018a0
runtime.(*mheap).mapBits(0x139d58, 0x74d26000)
/usr/lib/go/src/runtime/mbitmap.go:159 +0x94 fp=0x7eb018dc sp=0x7eb018c4
runtime.(*mheap).sysAlloc(0x139d58, 0x100000, 0x30414)
/usr/lib/go/src/runtime/malloc.go:428 +0x2dc fp=0x7eb0191c sp=0x7eb018dc
runtime.(*mheap).grow(0x139d58, 0x8, 0x0)
/usr/lib/go/src/runtime/mheap.go:774 +0xd0 fp=0x7eb0194c sp=0x7eb0191c
runtime.(*mheap).allocSpanLocked(0x139d58, 0x1, 0x0)
/usr/lib/go/src/runtime/mheap.go:678 +0x468 fp=0x7eb0196c sp=0x7eb0194c
runtime.(*mheap).alloc_m(0x139d58, 0x1, 0x10, 0x0, 0x0)
/usr/lib/go/src/runtime/mheap.go:562 +0xdc fp=0x7eb019a4 sp=0x7eb0196c
runtime.(*mheap).alloc.func1()
/usr/lib/go/src/runtime/mheap.go:627 +0x3c fp=0x7eb019c0 sp=0x7eb019a4
runtime.systemstack(0x7eb019d4)
/usr/lib/go/src/runtime/asm_arm.s:278 +0xa8 fp=0x7eb019c4 sp=0x7eb019c0
runtime.(*mheap).alloc(0x139d58, 0x1, 0x10, 0x100, 0x1)
/usr/lib/go/src/runtime/mheap.go:628 +0x60 fp=0x7eb019ec sp=0x7eb019c4
runtime.(*mcentral).grow(0x13ab98, 0x0)
/usr/lib/go/src/runtime/mcentral.go:212 +0x84 fp=0x7eb01a18 sp=0x7eb019ec
runtime.(*mcentral).cacheSpan(0x13ab98, 0x146b68)
/usr/lib/go/src/runtime/mcentral.go:93 +0x104 fp=0x7eb01a5c sp=0x7eb01a18
runtime.(*mcache).refill(0x649e5000, 0x10, 0x146b68)
/usr/lib/go/src/runtime/mcache.go:122 +0x7c fp=0x7eb01a70 sp=0x7eb01a5c
runtime.(*mcache).nextFree.func1()
/usr/lib/go/src/runtime/malloc.go:525 +0x24 fp=0x7eb01a80 sp=0x7eb01a70
runtime.systemstack(0x7eb01aa4)
/usr/lib/go/src/runtime/asm_arm.s:278 +0xa8 fp=0x7eb01a84 sp=0x7eb01a80
runtime.(*mcache).nextFree(0x649e5000, 0x649e5010, 0x7eb01acc, 0x1ebd4, 0x21c40)
/usr/lib/go/src/runtime/malloc.go:526 +0x9c fp=0x7eb01ab0 sp=0x7eb01a84
runtime.mallocgc(0xf0, 0xc2548, 0x1, 0x64a26001)
/usr/lib/go/src/runtime/malloc.go:678 +0x8c0 fp=0x7eb01b08 sp=0x7eb01ab0
runtime.newobject(0xc2548, 0x1384a0)
/usr/lib/go/src/runtime/malloc.go:807 +0x2c fp=0x7eb01b1c sp=0x7eb01b08
runtime.malg(0x8000, 0x2710)
/usr/lib/go/src/runtime/proc.go:2821 +0x1c fp=0x7eb01b38 sp=0x7eb01b1c
runtime.mpreinit(0x138730)
/usr/lib/go/src/runtime/os_linux.go:302 +0x1c fp=0x7eb01b44 sp=0x7eb01b38
runtime.mcommoninit(0x138730)
/usr/lib/go/src/runtime/proc.go:540 +0x94 fp=0x7eb01b5c sp=0x7eb01b44
runtime.schedinit()
/usr/lib/go/src/runtime/proc.go:476 +0x40 fp=0x7eb01b78 sp=0x7eb01b5c
runtime.rt0_go(0x7eb01d14, 0x76ee4000, 0x7eb01d14, 0x1, 0x5cd10, 0x7eb01c58, 0x76f8e000, 0xcef6f281, 0xc6a63e08, 0x128, ...)
/usr/lib/go/src/runtime/asm_arm.s:61 +0x84 fp=0x7eb01bb8 sp=0x7eb01b78
Tracing shows the offending mmap:
mmap2(0x74c54000, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x74c54000
mmap2(0x6d77e000, 122511360, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
The first one is right, that's the chunk size, but the second is where it
allocates the bitmap. It should be probably 16x smaller.
I tried debugging but it works under GDB. Maybe GDB does some
virtualization?
gdbserver :1000 zmq
Process zmq created; pid = 817
Listening on port 1000
Remote debugging from host 10.0.0.36
4 2 1
Child exited with status 0
So far, all pure go programs I've tested work as well.
Why does importing zeromq cause it to fail? Is there a bug in the arm port
of go?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#19831>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAcA-ghneoorfxtrG1WQlPURRCd6SrRks5rsZThgaJpZM4MyRGu>
.
|
That's interesting, but my target is pretty limited, as in <1ghz, <512 megs of ram. Go seemed a good fit since it seems to have great support for cross compiling, and this is the only case where it is not working as expected. It's also possible the pi still doing it and not noticed, as the pi has much more ram than my board and could afford to allocate this much. This issue is mainly about how unexpected this is. The bitmap should be 16x less than the chunk, not 116x larger. It makes me think something basic is not set correctly in the runtime. |
It happens with the native build on the pi as well, here is the trace:
It's 16x larger in this case, not as bad as my target but still unexpected as it should be 16x smaller. For comparison, this is the same trace on my desktop (go 1.8 linux/amd64):
On linux/amd64 the bitmap is 32x smaller than the allocated chunk, as expected. |
I think you're being selective here. I wrote a simple hello world go
program, no cgo, no zmq4, and strace -f reported a dozen mmap calls, one
for over 800mb.
That is to say, mmap is a symptom, not the cause of this bug.
…On Wed, 5 Apr 2017, 04:53 Chris Johnson ***@***.***> wrote:
It happens with the native build on the pi as well, here is the trace:
mmap2(0x10400000, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x10400000
mmap2(0xf3b0000, 17104896, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xf3b0000
It's 16x larger in this case, not as bad as my target but still unexpected
as it should be 16x smaller.
For comparison, this is the same trace on my desktop (go 1.8 linux/amd64):
mmap(0xc420000000, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xc420000000
mmap(0xc41fff8000, 32768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xc41fff8000
On linux/amd64 the bitmap is 32x smaller than the allocated chunk, as
expected.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#19831 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAAcA5_BcRLa1FKWFfgyehmiPk0lr3oEks5rspGogaJpZM4MyRGu>
.
|
The size of mmap calls in general is not an issue, but unlike the other calls this one uses MMAP_FIXED which actually allocates the memory instead of just addressing it. That the bitmap allocation is much larger than the heap chunk seems to me to be an obvious bug. To be more general it happens on the pi with a simple hello world:
|
I don't think this is the cause.
I thought the problem was the same program compiled natively vs cross
compiled behaved differently. Is that the issue?
…On Wed, 5 Apr 2017, 05:40 Chris Johnson ***@***.***> wrote:
The size of mmap calls in general is not an issue, but unlike the other
calls this one uses MMAP_FIXED which actually allocates the memory instead
of just addressing it.
That the bitmap allocation is much larger than the heap chunk seems to me
to be an obvious bug.
To be more general it happens on the pi with a simple hello world:
***@***.***:~$ cat hi.go
package main
import "fmt"
func main () {
fmt.Println("hi")
}
***@***.***:~$ strace -e mmap2 ./hi
mmap2(0x200000, 807411712, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x200000
mmap2(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x76f37000
mmap2(0x10400000, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x10400000
mmap2(0xf3b0000, 17104896, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xf3b0000
^
mmap2(0x200000, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x200000
mmap2(NULL, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x76f27000
mmap2(NULL, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x76f17000
mmap2(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x76ed7000
hi
+++ exited with 0 +++
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#19831 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAAcA4UcHlwbtxkYjz3Z2zJCIjv_mXzdks5rspybgaJpZM4MyRGu>
.
|
@davecheney, could you elaborate on your simple reproducer? I tried a simple hello world on linux/amd64 and all of the mmaps were completely reasonable (the largest was 1 MB). @plasticchris, you're right that something is definitely going wrong here. Working out your stack trace from the bottom to the top:
Most likely something went wrong computing the new arena_used and that propagated to a bad bitmap mapping.
MMAP_FIXED just means that the kernel must make the mapping where we requested it. It has nothing to do with pre-faulting the memory. |
|
'Doh, mmap2. Thanks @davecheney. I think this is a 32-bit bug, because I get the same 800MB mmap if I run that program on 386. |
Thanks for the clarification, that the mmap returned ENOMEM make me think it was. |
Correct, this happens before any it gets to the main function |
Actually, @davecheney's example is different and not a bug. In that case, the 800MB mmap is |
Sorry, I wasn't clear, I didn't mean to imply its was a bug. My goal is to
drive the conversation back to the difference in behaviour of the program
when compiled natively vs cross compiled.
…On Wed, 5 Apr 2017, 05:57 Austin Clements ***@***.***> wrote:
Actually, @davecheney <https://github.com/davecheney>'s example is
different and not a bug. In that case, the 800MB mmap is PROT_NONE, so
that's actually coming from sysReserve. That's expected behavior, since on
32-bit we reserve space for a 512MB heap plus ancillary structures first
thing (but it's PROT_NONE, so it's not "really" there). OTOH, in
@plasticchris <https://github.com/plasticchris>' original post, the bad
mmap was clearly coming from a real mapping of the heap bitmap when growing
the heap.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#19831 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAAcA92jDCZr_98yVenAbkIgnNdllgEyks5rsqC2gaJpZM4MyRGu>
.
|
@plasticchris, can you post the output of |
Here's my hunch: on 32-bit, we reserve space for enough heap bitmap to cover the entire 32-bit address space and we set arena_start to 0. When we map the bitmap, we have to map bitmap memory for everything from arena_start to the current top of the arena. On 64-bit, there's no waste here because the heap really starts at arena_start. On 32-bit, there's generally very little waste here because we start allocating the heap from just above the top of the mapped binary, which is generally pretty close to 0. But your heap is starting somewhere around 0x74c54000. Nowhere near 0. That either means that the top of your mapped binary is in fact nowhere near 0 (which could have to do with linking in zmq) or zmq mapped some memory in the way of our heap and we fell back to whatever address the OS could provide, which was something huge. This is causing us to create a large bitmap mapping. Now, why the bitmap mmap is failing with ENOMEM is another question. I think that either means overcommit is disabled on your Pi, or something (possibly zmq) made a huge number of mappings and this one happens to exceed the limit. @plasticchris, what are the values of |
Sorry, one more question: how many mmap calls are in your strace? (Or, if you can get it, how many lines are in |
It seems to me this isn't a cross compilation issue but an arm or possibly 32 bit issue as it shows up in native compiles for the pi. I posted the trace for hello world on the pi above and marked the mmaps. It's not as extreme but still present.
|
|
On the pi:
|
On my board:
|
Just to confirm, does either or both the natively compiled or cross
compiled binaries crash on your pi?
…On Wed, 5 Apr 2017, 07:04 Chris Johnson ***@***.***> wrote:
@plasticchris <https://github.com/plasticchris>, what are the values of
/proc/sys/vm/overcommit_memory, /proc/sys/vm/overcommit_ratio, and sysctl
vm.max_map_count?
On the pi:
***@***.***:~ $ cat /proc/sys/vm/overcommit_memory
0
***@***.***:~ $ cat /proc/sys/vm/overcommit_ratio
50
***@***.***:~ $ sysctl vm.max_map_count
vm.max_map_count = 65530
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#19831 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAAcA1tguLSq0exE6hL7720MKwsOwQtbks5rsrB6gaJpZM4MyRGu>
.
|
Both run on the pi, they allocate more memory than expected for the bitmap but not enough to crash it. On my board both the natively and cross compiled binaries crash at the same point:
|
Ok, thanks for confirming this. I mistakenly thought that different
compilation strategies produced a binary that crashed.
…On Wed, 5 Apr 2017, 07:30 Chris Johnson ***@***.***> wrote:
Both run on the pi, they allocate more memory than expected for the bitmap
but not enough to crash it. On my board both the natively and cross
compiled binaries crash at the same point:
# strace -e mmap2 ./zmq_native
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x76ff7000
mmap2(NULL, 771, PROT_READ, MAP_PRIVATE, 3, 0) = 0x76ff6000
mmap2(NULL, 694688, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x76f27000
mmap2(0x76fcd000, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x96000) = 0x76fcd000
mmap2(NULL, 143960, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x76f03000
mmap2(0x76f23000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x10000) = 0x76f23000
mmap2(0x76f25000, 4696, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x76f25000
mmap2(NULL, 984424, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x76e12000
mmap2(0x76efd000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xdb000) = 0x76efd000
mmap2(0x76f00000, 9576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x76f00000
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x76ff5000
mmap2(NULL, 86544, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x76dfc000
mmap2(0x76e10000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x4000) = 0x76e10000
mmap2(NULL, 1079440, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x76cf4000
mmap2(0x76df3000, 28672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xef000) = 0x76df3000
mmap2(0x76dfa000, 6288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x76dfa000
mmap2(NULL, 479376, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x76c7e000
mmap2(0x76cf2000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x64000) = 0x76cf2000
mmap2(NULL, 164412, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x76c55000
mmap2(0x76c7d000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x18000) = 0x76c7d000
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x76ff4000
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x76ff3000
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x76ff2000
mmap2(0x200000, 807411712, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x46a53000
mmap2(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x46a13000
mmap2(0x56c54000, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x56c54000
mmap2(0x5157e000, 91054080, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
fatal error: runtime: out of memory
runtime stack:
runtime.throw(0xca163, 0x16)
/usr/local/go/src/runtime/panic.go:596 +0x70 fp=0x7ecf8890 sp=0x7ecf8884
runtime.sysMap(0x5157e000, 0x56d6000, 0x1, 0x147b58)
/usr/local/go/src/runtime/mem_linux.go:227 +0xb0 fp=0x7ecf88b4 sp=0x7ecf8890
runtime.(*mheap).mapBits(0x13ad50, 0x56d54000)
/usr/local/go/src/runtime/mbitmap.go:159 +0x94 fp=0x7ecf88cc sp=0x7ecf88b4
runtime.(*mheap).sysAlloc(0x13ad50, 0x100000, 0x3049c)
/usr/local/go/src/runtime/malloc.go:429 +0x2dc fp=0x7ecf890c sp=0x7ecf88cc
runtime.(*mheap).grow(0x13ad50, 0x8, 0x0)
/usr/local/go/src/runtime/mheap.go:774 +0x48 fp=0x7ecf893c sp=0x7ecf890c
runtime.(*mheap).allocSpanLocked(0x13ad50, 0x1, 0x0)
/usr/local/go/src/runtime/mheap.go:678 +0x468 fp=0x7ecf895c sp=0x7ecf893c
runtime.(*mheap).alloc_m(0x13ad50, 0x1, 0x10, 0x0, 0x0)
/usr/local/go/src/runtime/mheap.go:562 +0xdc fp=0x7ecf8994 sp=0x7ecf895c
runtime.(*mheap).alloc.func1()
/usr/local/go/src/runtime/mheap.go:627 +0x3c fp=0x7ecf89b0 sp=0x7ecf8994
runtime.systemstack(0x7ecf89c4)
/usr/local/go/src/runtime/asm_arm.s:278 +0xa8 fp=0x7ecf89b4 sp=0x7ecf89b0
runtime.(*mheap).alloc(0x13ad50, 0x1, 0x10, 0x100, 0x1)
/usr/local/go/src/runtime/mheap.go:628 +0x60 fp=0x7ecf89dc sp=0x7ecf89b4
runtime.(*mcentral).grow(0x13bb90, 0x0)
/usr/local/go/src/runtime/mcentral.go:212 +0x84 fp=0x7ecf8a08 sp=0x7ecf89dc
runtime.(*mcentral).cacheSpan(0x13bb90, 0x147b60)
/usr/local/go/src/runtime/mcentral.go:93 +0x104 fp=0x7ecf8a4c sp=0x7ecf8a08
runtime.(*mcache).refill(0x46a13000, 0x10, 0x147b60)
/usr/local/go/src/runtime/mcache.go:122 +0x7c fp=0x7ecf8a60 sp=0x7ecf8a4c
runtime.(*mcache).nextFree.func1()
/usr/local/go/src/runtime/malloc.go:526 +0x24 fp=0x7ecf8a70 sp=0x7ecf8a60
runtime.systemstack(0x7ecf8a94)
/usr/local/go/src/runtime/asm_arm.s:278 +0xa8 fp=0x7ecf8a74 sp=0x7ecf8a70
runtime.(*mcache).nextFree(0x46a13000, 0x46a13010, 0x7ecf8abc, 0x1ec5c, 0x21cc8)
/usr/local/go/src/runtime/malloc.go:527 +0x9c fp=0x7ecf8aa0 sp=0x7ecf8a74
runtime.mallocgc(0xf0, 0xc3980, 0x1, 0x46a54001)
/usr/local/go/src/runtime/malloc.go:679 +0x8c0 fp=0x7ecf8af8 sp=0x7ecf8aa0
runtime.newobject(0xc3980, 0x139498)
/usr/local/go/src/runtime/malloc.go:808 +0x2c fp=0x7ecf8b0c sp=0x7ecf8af8
runtime.malg(0x8000, 0x2710)
/usr/local/go/src/runtime/proc.go:2821 +0x1c fp=0x7ecf8b28 sp=0x7ecf8b0c
runtime.mpreinit(0x139728)
/usr/local/go/src/runtime/os_linux.go:302 +0x1c fp=0x7ecf8b34 sp=0x7ecf8b28
runtime.mcommoninit(0x139728)
/usr/local/go/src/runtime/proc.go:540 +0x94 fp=0x7ecf8b4c sp=0x7ecf8b34
runtime.schedinit()
/usr/local/go/src/runtime/proc.go:476 +0x40 fp=0x7ecf8b68 sp=0x7ecf8b4c
runtime.rt0_go(0x7ecf8d04, 0x76eff000, 0x7ecf8d04, 0x1, 0x5cd0c, 0x7ecf8c48, 0x76ffa000, 0xa3b102d0, 0xab9c0e29, 0x128, ...)
/usr/local/go/src/runtime/asm_arm.s:61 +0x84 fp=0x7ecf8ba8 sp=0x7ecf8b68
+++ exited with 2 +++
# strace -e mmap2 ./zmq
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x76f7f000
mmap2(NULL, 771, PROT_READ, MAP_PRIVATE, 3, 0) = 0x76f7e000
mmap2(NULL, 362376, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x76f00000
mmap2(0x76f56000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x46000) = 0x76f56000
mmap2(NULL, 143960, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x76edc000
mmap2(0x76efc000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x10000) = 0x76efc000
mmap2(0x76efe000, 4696, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x76efe000
mmap2(NULL, 984424, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x76deb000
mmap2(0x76ed6000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xdb000) = 0x76ed6000
mmap2(0x76ed9000, 9576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x76ed9000
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x76f7d000
mmap2(NULL, 86544, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x76dd5000
mmap2(0x76de9000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x4000) = 0x76de9000
mmap2(NULL, 73912, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x76dc2000
mmap2(0x76dd3000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0x76dd3000
mmap2(NULL, 1079440, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x76cba000
mmap2(0x76db9000, 28672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xef000) = 0x76db9000
mmap2(0x76dc0000, 6288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x76dc0000
mmap2(NULL, 479376, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x76c44000
mmap2(0x76cb8000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x64000) = 0x76cb8000
mmap2(NULL, 164412, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x76c1b000
mmap2(0x76c43000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x18000) = 0x76c43000
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x76f7c000
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x76f7b000
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x76f7a000
mmap2(0x200000, 304095232, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x64a19000
mmap2(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x649d9000
runtime: asks for:1048576-byte block (0 in use)
mmap2(0x74c1a000, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x74c1a000
mmap2(0x6d748000, 122494976, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
fatal error: runtime: out of memory
runtime stack:
runtime.throw(0xc8d3d, 0x16)
/usr/lib/go/src/runtime/panic.go:596 +0x70 fp=0x7eb198a0 sp=0x7eb19894
runtime.sysMap(0x6d748000, 0x74d2000, 0x1, 0x146b60)
/usr/lib/go/src/runtime/mem_linux.go:227 +0xb0 fp=0x7eb198c4 sp=0x7eb198a0
runtime.(*mheap).mapBits(0x139d58, 0x74d1a000)
/usr/lib/go/src/runtime/mbitmap.go:159 +0x94 fp=0x7eb198dc sp=0x7eb198c4
runtime.(*mheap).sysAlloc(0x139d58, 0x100000, 0x30414)
/usr/lib/go/src/runtime/malloc.go:428 +0x2dc fp=0x7eb1991c sp=0x7eb198dc
runtime.(*mheap).grow(0x139d58, 0x8, 0x0)
/usr/lib/go/src/runtime/mheap.go:774 +0xd0 fp=0x7eb1994c sp=0x7eb1991c
runtime.(*mheap).allocSpanLocked(0x139d58, 0x1, 0x0)
/usr/lib/go/src/runtime/mheap.go:678 +0x468 fp=0x7eb1996c sp=0x7eb1994c
runtime.(*mheap).alloc_m(0x139d58, 0x1, 0x10, 0x0, 0x0)
/usr/lib/go/src/runtime/mheap.go:562 +0xdc fp=0x7eb199a4 sp=0x7eb1996c
runtime.(*mheap).alloc.func1()
/usr/lib/go/src/runtime/mheap.go:627 +0x3c fp=0x7eb199c0 sp=0x7eb199a4
runtime.systemstack(0x7eb199d4)
/usr/lib/go/src/runtime/asm_arm.s:278 +0xa8 fp=0x7eb199c4 sp=0x7eb199c0
runtime.(*mheap).alloc(0x139d58, 0x1, 0x10, 0x100, 0x1)
/usr/lib/go/src/runtime/mheap.go:628 +0x60 fp=0x7eb199ec sp=0x7eb199c4
runtime.(*mcentral).grow(0x13ab98, 0x0)
/usr/lib/go/src/runtime/mcentral.go:212 +0x84 fp=0x7eb19a18 sp=0x7eb199ec
runtime.(*mcentral).cacheSpan(0x13ab98, 0x146b68)
/usr/lib/go/src/runtime/mcentral.go:93 +0x104 fp=0x7eb19a5c sp=0x7eb19a18
runtime.(*mcache).refill(0x649d9000, 0x10, 0x146b68)
/usr/lib/go/src/runtime/mcache.go:122 +0x7c fp=0x7eb19a70 sp=0x7eb19a5c
runtime.(*mcache).nextFree.func1()
/usr/lib/go/src/runtime/malloc.go:525 +0x24 fp=0x7eb19a80 sp=0x7eb19a70
runtime.systemstack(0x7eb19aa4)
/usr/lib/go/src/runtime/asm_arm.s:278 +0xa8 fp=0x7eb19a84 sp=0x7eb19a80
runtime.(*mcache).nextFree(0x649d9000, 0x649d9010, 0x7eb19acc, 0x1ebd4, 0x21c40)
/usr/lib/go/src/runtime/malloc.go:526 +0x9c fp=0x7eb19ab0 sp=0x7eb19a84
runtime.mallocgc(0xf0, 0xc2548, 0x1, 0x64a1a001)
/usr/lib/go/src/runtime/malloc.go:678 +0x8c0 fp=0x7eb19b08 sp=0x7eb19ab0
runtime.newobject(0xc2548, 0x1384a0)
/usr/lib/go/src/runtime/malloc.go:807 +0x2c fp=0x7eb19b1c sp=0x7eb19b08
runtime.malg(0x8000, 0x2710)
/usr/lib/go/src/runtime/proc.go:2821 +0x1c fp=0x7eb19b38 sp=0x7eb19b1c
runtime.mpreinit(0x138730)
/usr/lib/go/src/runtime/os_linux.go:302 +0x1c fp=0x7eb19b44 sp=0x7eb19b38
runtime.mcommoninit(0x138730)
/usr/lib/go/src/runtime/proc.go:540 +0x94 fp=0x7eb19b5c sp=0x7eb19b44
runtime.schedinit()
/usr/lib/go/src/runtime/proc.go:476 +0x40 fp=0x7eb19b78 sp=0x7eb19b5c
runtime.rt0_go(0x7eb19d14, 0x76ed8000, 0x7eb19d14, 0x1, 0x5cd10, 0x7eb19c58, 0x76f82000, 0xe0e41da8, 0xe8b59121, 0x128, ...)
/usr/lib/go/src/runtime/asm_arm.s:61 +0x84 fp=0x7eb19bb8 sp=0x7eb19b78
+++ exited with 2 +++
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#19831 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAAcAxTeNVRJ4Hh2PapjqCHiM8l4NcKtks5rsrZxgaJpZM4MyRGu>
.
|
This is where things first go wrong:
Everything before this must be C/zmq initialization. This is Unfortunately, this doesn't explain why it got mapped there. Based on your readelf output, the binary isn't mapped there, and based on your strace output none of the earlier mmaps are there either, so the kernel should have been able to give up the requested address. It also doesn't explain why the bitmap mmap is failing, since you have overcommit (heuristically) allowed and are clearly well under the mmap limit. Could you try running |
If I do this it works. I guess it was exceeding the overcommit. TIL Thanks for explaining! That makes sense. It seems to me this extra memory won't get faulted so it's not elegant but should work. Maybe issue a warning in this case? Or use the reservation's actual address? Unless assuming the heap starts at zero saves some cycles later on? |
There are no warnings in the runtime. It would seem odd for this to be the first one. :) I still don't understand why this is happening, though. From what you've posted, there's no obvious reason why the kernel is giving us a high address instead of the one we asked for. Could you tweak your simple program that printed the version of zmq to also print the contents of /proc/self/maps before exiting and post the output along with its strace?
It's not about cycles; it's about how much memory you can allocate. Starting the arena at zero means the Go heap can use the entire address space before running out of memory, which can be important in a limited 32-bit address space. We didn't used to do this, which meant programs OOM'ed after just 512 MB of allocation, even if there was plenty of RAM left. It would be nice if we only had to map the bitmap for the part we're actually using, but we don't have the structures to do that right now. However, I have been thinking about whether we could unmap parts the bitmap when we release heap memory back to the OS, and supporting that may make it easier to only map the bitmap we need in the first place. |
Here it is:
|
strace is a bit long:
|
Fair enough. I'm pretty new to go. |
Ah hah.
Something, presumably part of zmq, is growing the process brk before Go runtime initialization. For reasons I don't entirely understand, the brk mapping wound up right in the middle of the heap mapping Go asked for, which is why that failed and the kernel put the heap mapping somewhere else. @plasticchris, can you gdb your binary, set a breakpoint on A possible fix for this is for the runtime to query the brk and factor that in to where it asks to put the heap mapping. |
It never had a problem running under GDB, so I'm not sure if it is representative. |
That's too bad. Though the backtrace at least confirms that something is calling C malloc before Go runtime initialization. I took a look at Ubuntu's libzmq.so.3 (just because it was the easiest thing to get my hands on) and it does have some global constructors that the dynamic linker will run before invoking Go runtime initialization, so this isn't too surprising. |
Running it on the pi gives a bit more info:
|
@plasticchris, can you try patching in https://go-review.googlesource.com/39810 and see if that works with the default overcommit setting? It implements the idea of checking and avoiding the brk while setting up the Go heap. It'll only build on linux/arm and linux/386 right now since I haven't implemented the brk syscall on anything else (which is a real pain :( ). |
@aclements the patch works, mmap2 gets the address it requests. Here is the relevant part of the strace:
|
CL https://golang.org/cl/39810 mentions this issue. |
Change https://golang.org/cl/85887 mentions this issue: |
This replaces the contiguous heap arena mapping with a potentially sparse mapping that can support heap mappings anywhere in the address space. This has several advantages over the current approach: * There is no longer any limit on the size of the Go heap. (Currently it's limited to 512GB.) Hence, this fixes #10460. * It eliminates many failures modes of heap initialization and growing. In particular it eliminates any possibility of panicking with an address space conflict. This can happen for many reasons and even causes a low but steady rate of TSAN test failures because of conflicts with the TSAN runtime. See #16936 and #11993. * It eliminates the notion of "non-reserved" heap, which was added because creating huge address space reservations (particularly on 64-bit) led to huge process VSIZE. This was at best confusing and at worst conflicted badly with ulimit -v. However, the non-reserved heap logic is complicated, can race with other mappings in non-pure Go binaries (e.g., #18976), and requires that the entire heap be either reserved or non-reserved. We currently maintain the latter property, but it's quite difficult to convince yourself of that, and hence difficult to keep correct. This logic is still present, but will be removed in the next CL. * It fixes problems on 32-bit where skipping over parts of the address space leads to mapping huge (and never-to-be-used) metadata structures. See #19831. This also completely rewrites and significantly simplifies mheap.sysAlloc, which has been a source of many bugs. E.g., #21044, #20259, #18651, and #13143 (and maybe #23222). This change also makes it possible to allocate individual objects larger than 512GB. As a result, a few tests that expected huge allocations to fail needed to be changed to make even larger allocations. However, at the moment attempting to allocate a humongous object may cause the program to freeze for several minutes on Linux as we fall back to probing every page with addrspace_free. That logic (and this failure mode) will be removed in the next CL. Fixes #10460. Fixes #22204 (since it rewrites the code involved). This slightly slows down compilebench and the x/benchmarks garbage benchmark. name old time/op new time/op delta Template 184ms ± 1% 185ms ± 1% ~ (p=0.065 n=10+9) Unicode 86.9ms ± 3% 86.3ms ± 1% ~ (p=0.631 n=10+10) GoTypes 599ms ± 0% 602ms ± 0% +0.56% (p=0.000 n=10+9) Compiler 2.87s ± 1% 2.89s ± 1% +0.51% (p=0.002 n=9+10) SSA 7.29s ± 1% 7.25s ± 1% ~ (p=0.182 n=10+9) Flate 118ms ± 2% 118ms ± 1% ~ (p=0.113 n=9+9) GoParser 147ms ± 1% 148ms ± 1% +1.07% (p=0.003 n=9+10) Reflect 401ms ± 1% 404ms ± 1% +0.71% (p=0.003 n=10+9) Tar 175ms ± 1% 175ms ± 1% ~ (p=0.604 n=9+10) XML 209ms ± 1% 210ms ± 1% ~ (p=0.052 n=10+10) (https://perf.golang.org/search?q=upload:20171231.4) name old time/op new time/op delta Garbage/benchmem-MB=64-12 2.23ms ± 1% 2.25ms ± 1% +0.84% (p=0.000 n=19+19) (https://perf.golang.org/search?q=upload:20171231.3) Relative to the start of the sparse heap changes (starting at and including "runtime: fix various contiguous bitmap assumptions"), overall slowdown is roughly 1% on GC-intensive benchmarks: name old time/op new time/op delta Template 183ms ± 1% 185ms ± 1% +1.32% (p=0.000 n=9+9) Unicode 84.9ms ± 2% 86.3ms ± 1% +1.65% (p=0.000 n=9+10) GoTypes 595ms ± 1% 602ms ± 0% +1.19% (p=0.000 n=9+9) Compiler 2.86s ± 0% 2.89s ± 1% +0.91% (p=0.000 n=9+10) SSA 7.19s ± 0% 7.25s ± 1% +0.75% (p=0.000 n=8+9) Flate 117ms ± 1% 118ms ± 1% +1.10% (p=0.000 n=10+9) GoParser 146ms ± 2% 148ms ± 1% +1.48% (p=0.002 n=10+10) Reflect 398ms ± 1% 404ms ± 1% +1.51% (p=0.000 n=10+9) Tar 173ms ± 1% 175ms ± 1% +1.17% (p=0.000 n=10+10) XML 208ms ± 1% 210ms ± 1% +0.62% (p=0.011 n=10+10) [Geo mean] 369ms 373ms +1.17% (https://perf.golang.org/search?q=upload:20180101.2) name old time/op new time/op delta Garbage/benchmem-MB=64-12 2.22ms ± 1% 2.25ms ± 1% +1.51% (p=0.000 n=20+19) (https://perf.golang.org/search?q=upload:20180101.3) Change-Id: I5daf4cfec24b252e5a57001f0a6c03f22479d0f0 Reviewed-on: https://go-review.googlesource.com/85887 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>
Change https://golang.org/cl/122587 mentions this issue: |
go version
1.8 linux/amd64
go env ($TOOLCHAIN and $SYSROOT are for cross compile)
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
CGO_ENABLED=1
C_INCLUDE_PATH=$SYSROOT/usr/include/
LIBRARY_PATH=$SYSROOT/usr/lib
CC=$TOOLCHAIN
PKG_CONFIG_PATH=$SYSROOT/usr/lib/pkgconfig/
GOARCH=arm
GOARM=7
I'm using go v1.8 to cross compile to an armv7 target and everything works until I import zeromq, and then not even this works:
Running it on the target gives the following:
Tracing shows the offending mmap:
The first one is right, that's the chunk size, but the second is where it allocates the bitmap. It should be probably 16x smaller.
I tried debugging but it works under GDB. Maybe GDB does some virtualization?
So far, all pure go programs I've tested work as well.
Why does importing zeromq cause it to fail? Is there a bug in the arm port of go?
The text was updated successfully, but these errors were encountered: