-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: arena mapping creates large core files and bad mlockall behavior #28081
Comments
Reproduced locally, but if I stat the file I'm pretty sure it's appearing as sparse. While the reported size is 100MB, (# of blocks) * (# of bytes per block) = 8640 * 4096 ~= 34K for me. |
Almost certainly related to the arena changes. These core files are large, but are very sparse, so their actual on-disk size is much smaller. |
If we care about fixing this, we probably need to map the arenas |
Ok, it sounds like disk space is not a huge issue. |
Huh, it's not smart enough to zero-share those pages, even if it's a read-only shared mapping. package main
import (
"log"
"os"
"syscall"
"time"
)
func main() {
const size = 512 << 20
f, _ := os.Create("/tmp/sparse")
f.Truncate(size)
p, _ := syscall.Mmap(int(f.Fd()), 0, size, syscall.PROT_READ, syscall.MAP_SHARED)
sum := 0
for i := range p {
sum += int(p[i])
}
if sum != 0 {
log.Fatal(sum)
}
time.Sleep(10 * time.Second)
} |
Hi Guys, Here is the backtrace for the 100M memory malloc. [root@10-25-51-148 ~]# pmap -x `pgrep mem-test` 17590: ./mem-test Address Kbytes RSS Dirty Mode Mapping 0000000000400000 316 244 0 r-x-- mem-test 000000000044f000 416 100 0 r---- mem-test 00000000004b7000 12 12 12 rw--- mem-test 00000000004ba000 124 36 36 rw--- [ anon ] 000000c000000000 65536 2048 2048 rw--- [ anon ] 00007f0f11962000 35264 2168 2168 rw--- [ anon ] 00007ffe11d76000 132 16 16 rw--- [ stack ] 00007ffe11dec000 8 4 0 r-x-- [ anon ] ffffffffff600000 4 0 0 r-x-- [ anon ] ---------------- ------- ------- ------- total kB 101812 4628 4280 (gdb) bt #0 runtime.sysMmap () at /home/lizgao/go/src/runtime/sys_linux_amd64.s:450 #1 0x0000000000403261 in runtime.mmap (addr=0xc000000000, fd=-1, flags=34, n=67108864, off=0, prot=0, ~r6=, ~r7=) at /home/lizgao/go/src/runtime/cgo_mmap.go:39 #2 0x0000000000410d6e in runtime.sysReserve (n=, v=, ~r2=) at /home/lizgao/go/src/runtime/mem_linux.go:144 #3 0x00000000004091b9 in runtime.(*mheap).sysAlloc (h=0x4be660 , n=67108864, size=0, v=0x0) at /home/lizgao/go/src/runtime/malloc.go:541 #4 0x000000000041c372 in runtime.(*mheap).grow (h=0x4be660 , npage=, ~r1=) at /home/lizgao/go/src/runtime/mheap.go:919 #5 0x000000000041c1f7 in runtime.(*mheap).allocSpanLocked (h=0x4be660 , npage=1, stat=0x4d6c68 , ~r2=) at /home/lizgao/go/src/runtime/mheap.go:848 #6 0x000000000041ba09 in runtime.(*mheap).alloc_m (h=0x4be660 , large=false, npage=1, spanclass=42 '*', ~r3=) at /home/lizgao/go/src/runtime/mheap.go:692 #7 0x00000000004451ac in runtime.(*mheap).alloc.func1 () at /home/lizgao/go/src/runtime/mheap.go:759 #8 0x000000000041bcaa in runtime.(*mheap).alloc (h=, large=, needzero=, npage=, spanclass=, ~r4=) at /home/lizgao/go/src/runtime/mheap.go:758 #9 0x0000000000410874 in runtime.(*mcentral).grow (c=0x4c0418 , ~r0=) at /home/lizgao/go/src/runtime/mcentral.go:232 #10 0x00000000004103c8 in runtime.(*mcentral).cacheSpan (c=0x4c0418 , ~r0=) at /home/lizgao/go/src/runtime/mcentral.go:106 #11 0x000000000040ff85 in runtime.(*mcache).refill (c=0x7ffff7fbd000, spc=42 '*') at /home/lizgao/go/src/runtime/mcache.go:118 #12 0x0000000000444692 in runtime.(*mcache).nextFree.func1 () at /home/lizgao/go/src/runtime/malloc.go:749 #13 0x0000000000409536 in runtime.(*mcache).nextFree (c=0x7ffff7fbd000, s=, shouldhelpgc=, spc=42 '*', v=) at /home/lizgao/go/src/runtime/malloc.go:748 #14 0x0000000000409e83 in runtime.mallocgc (needzero=true, size=384, typ=0x46baa0, ~r3=) at /home/lizgao/go/src/runtime/malloc.go:903 #15 0x000000000040a268 in runtime.newobject (typ=, ~r1=) at /home/lizgao/go/src/runtime/malloc.go:1032 #16 0x000000000042b411 in runtime.malg (stacksize=, ~r1=) at /home/lizgao/go/src/runtime/proc.go:3285 #17 0x0000000000420649 in runtime.mpreinit (mp=) at /home/lizgao/go/src/runtime/os_linux.go:311 #18 0x0000000000424ff1 in runtime.mcommoninit (mp=0x4ba3a0 ) at /home/lizgao/go/src/runtime/proc.go:624 #19 0x0000000000424cb9 in runtime.schedinit () at /home/lizgao/go/src/runtime/proc.go:546 #20 0x0000000000446b8a in runtime.rt0_go () at /home/lizgao/go/src/runtime/asm_amd64.s:195 #21 0x0000000000000001 in ?? () #22 0x00007fffffffe578 in ?? () #23 0x0000000000000001 in ?? () #24 0x00007fffffffe578 in ?? () #25 0x0000000000000000 in ?? () (gdb) c Continuing. Breakpoint 1, runtime.sysMmap () at /home/lizgao/go/src/runtime/sys_linux_amd64.s:450 450 MOVQ addr+0(FP), DI (gdb) bt #0 runtime.sysMmap () at /home/lizgao/go/src/runtime/sys_linux_amd64.s:450 #1 0x0000000000403261 in runtime.mmap (addr=0xc000000000, fd=-1, flags=50, n=67108864, off=0, prot=3, ~r6=, ~r7=) at /home/lizgao/go/src/runtime/cgo_mmap.go:39 #2 0x0000000000410e1a in runtime.sysMap (n=, sysStat=, v=0xc000000000) at /home/lizgao/go/src/runtime/mem_linux.go:154 #3 0x0000000000408e87 in runtime.(*mheap).sysAlloc (h=0x4be660 , n=67108864, size=, v=) at /home/lizgao/go/src/runtime/malloc.go:619 #4 0x000000000041c372 in runtime.(*mheap).grow (h=0x4be660 , npage=, ~r1=) at /home/lizgao/go/src/runtime/mheap.go:919 #5 0x000000000041c1f7 in runtime.(*mheap).allocSpanLocked (h=0x4be660 , npage=1, stat=0x4d6c68 , ~r2=) at /home/lizgao/go/src/runtime/mheap.go:848 #6 0x000000000041ba09 in runtime.(*mheap).alloc_m (h=0x4be660 , large=false, npage=1, spanclass=42 '*', ~r3=) at /home/lizgao/go/src/runtime/mheap.go:692 #7 0x00000000004451ac in runtime.(*mheap).alloc.func1 () at /home/lizgao/go/src/runtime/mheap.go:759 #8 0x000000000041bcaa in runtime.(*mheap).alloc (h=, large=, needzero=, npage=, spanclass=, ~r4=) at /home/lizgao/go/src/runtime/mheap.go:758 #9 0x0000000000410874 in runtime.(*mcentral).grow (c=0x4c0418 , ~r0=) at /home/lizgao/go/src/runtime/mcentral.go:232 #10 0x00000000004103c8 in runtime.(*mcentral).cacheSpan (c=0x4c0418 , ~r0=) at /home/lizgao/go/src/runtime/mcentral.go:106 #11 0x000000000040ff85 in runtime.(*mcache).refill (c=0x7ffff7fbd000, spc=42 '*') at /home/lizgao/go/src/runtime/mcache.go:118 #12 0x0000000000444692 in runtime.(*mcache).nextFree.func1 () at /home/lizgao/go/src/runtime/malloc.go:749 #13 0x0000000000409536 in runtime.(*mcache).nextFree (c=0x7ffff7fbd000, s=, shouldhelpgc=, spc=42 '*', v=) at /home/lizgao/go/src/runtime/malloc.go:748 #14 0x0000000000409e83 in runtime.mallocgc (needzero=true, size=384, typ=0x46baa0, ~r3=) at /home/lizgao/go/src/runtime/malloc.go:903 #15 0x000000000040a268 in runtime.newobject (typ=, ~r1=) at /home/lizgao/go/src/runtime/malloc.go:1032 #16 0x000000000042b411 in runtime.malg (stacksize=, ~r1=) at /home/lizgao/go/src/runtime/proc.go:3285 #17 0x0000000000420649 in runtime.mpreinit (mp=) at /home/lizgao/go/src/runtime/os_linux.go:311 #18 0x0000000000424ff1 in runtime.mcommoninit (mp=0x4ba3a0 ) at /home/lizgao/go/src/runtime/proc.go:624 #19 0x0000000000424cb9 in runtime.schedinit () at /home/lizgao/go/src/runtime/proc.go:546 #20 0x0000000000446b8a in runtime.rt0_go () at /home/lizgao/go/src/runtime/asm_amd64.s:195 #21 0x0000000000000001 in ?? () #22 0x00007fffffffe578 in ?? () #23 0x0000000000000001 in ?? () #24 0x00007fffffffe578 in ?? () #25 0x0000000000000000 in ?? () (gdb) c Continuing. Breakpoint 1, runtime.sysMmap () at /home/lizgao/go/src/runtime/sys_linux_amd64.s:450 450 MOVQ addr+0(FP), DI (gdb) bt #0 runtime.sysMmap () at /home/lizgao/go/src/runtime/sys_linux_amd64.s:450 #1 0x0000000000403261 in runtime.mmap (addr=0x0, fd=-1, flags=34, n=33554432, off=0, prot=3, ~r6=, ~r7=) at /home/lizgao/go/src/runtime/cgo_mmap.go:39 #2 0x000000000041099e in runtime.sysAlloc (n=33554432, sysStat=, ~r2=) at /home/lizgao/go/src/runtime/mem_linux.go:21 #3 0x000000000040a899 in runtime.persistentalloc1 (align=67108864, size=33554432, sysStat=0x0, ~r3=) at /home/lizgao/go/src/runtime/malloc.go:1172 #4 0x0000000000444765 in runtime.persistentalloc.func1 () at /home/lizgao/go/src/runtime/malloc.go:1143 #5 0x000000000040a632 in runtime.persistentalloc (align=, size=, sysStat=, ~r3=) at /home/lizgao/go/src/runtime/malloc.go:1142 #6 0x0000000000408fce in runtime.(*mheap).sysAlloc (h=0x4be660 , n=67108864, size=, v=) at /home/lizgao/go/src/runtime/malloc.go:627 #7 0x000000000041c372 in runtime.(*mheap).grow (h=0x4be660 , npage=, ~r1=) at /home/lizgao/go/src/runtime/mheap.go:919 #8 0x000000000041c1f7 in runtime.(*mheap).allocSpanLocked (h=0x4be660 , npage=1, stat=0x4d6c68 , ~r2=) at /home/lizgao/go/src/runtime/mheap.go:848 #9 0x000000000041ba09 in runtime.(*mheap).alloc_m (h=0x4be660 , large=false, npage=1, spanclass=42 '*', ~r3=) at /home/lizgao/go/src/runtime/mheap.go:692 #10 0x00000000004451ac in runtime.(*mheap).alloc.func1 () at /home/lizgao/go/src/runtime/mheap.go:759 #11 0x000000000041bcaa in runtime.(*mheap).alloc (h=, large=, needzero=, npage=, spanclass=, ~r4=) at /home/lizgao/go/src/runtime/mheap.go:758 #12 0x0000000000410874 in runtime.(*mcentral).grow (c=0x4c0418 , ~r0=) at /home/lizgao/go/src/runtime/mcentral.go:232 #13 0x00000000004103c8 in runtime.(*mcentral).cacheSpan (c=0x4c0418 , ~r0=) at /home/lizgao/go/src/runtime/mcentral.go:106 #14 0x000000000040ff85 in runtime.(*mcache).refill (c=0x7ffff7fbd000, spc=42 '*') at /home/lizgao/go/src/runtime/mcache.go:118 #15 0x0000000000444692 in runtime.(*mcache).nextFree.func1 () at /home/lizgao/go/src/runtime/malloc.go:749 #16 0x0000000000409536 in runtime.(*mcache).nextFree (c=0x7ffff7fbd000, s=, shouldhelpgc=, spc=42 '*', v=) at /home/lizgao/go/src/runtime/malloc.go:748 #17 0x0000000000409e83 in runtime.mallocgc (needzero=true, size=384, typ=0x46baa0, ~r3=) at /home/lizgao/go/src/runtime/malloc.go:903 #18 0x000000000040a268 in runtime.newobject (typ=, ~r1=) at /home/lizgao/go/src/runtime/malloc.go:1032 #19 0x000000000042b411 in runtime.malg (stacksize=, ~r1=) at /home/lizgao/go/src/runtime/proc.go:3285 #20 0x0000000000420649 in runtime.mpreinit (mp=) at /home/lizgao/go/src/runtime/os_linux.go:311 #21 0x0000000000424ff1 in runtime.mcommoninit (mp=0x4ba3a0 ) at /home/lizgao/go/src/runtime/proc.go:624 #22 0x0000000000424cb9 in runtime.schedinit () at /home/lizgao/go/src/runtime/proc.go:546 #23 0x0000000000446b8a in runtime.rt0_go () at /home/lizgao/go/src/runtime/asm_amd64.s:195 #24 0x0000000000000001 in ?? () #25 0x00007fffffffe578 in ?? () #26 0x0000000000000001 in ?? () #27 0x00007fffffffe578 in ?? () #28 0x0000000000000000 in ?? () (gdb) c Continuing. Breakpoint 1, runtime.sysMmap () at /home/lizgao/go/src/runtime/sys_linux_amd64.s:450 450 MOVQ addr+0(FP), DI (gdb) bt #0 runtime.sysMmap () at /home/lizgao/go/src/runtime/sys_linux_amd64.s:450 #1 0x0000000000403261 in runtime.mmap (addr=0x0, fd=-1, flags=34, n=2162688, off=0, prot=3, ~r6=, ~r7=) at /home/lizgao/go/src/runtime/cgo_mmap.go:39 #2 0x000000000041099e in runtime.sysAlloc (n=2162688, sysStat=, ~r2=) at /home/lizgao/go/src/runtime/mem_linux.go:21 #3 0x000000000040a899 in runtime.persistentalloc1 (align=67108864, size=2162688, sysStat=0x4d6cb8 , ~r3=) at /home/lizgao/go/src/runtime/malloc.go:1172 #4 0x0000000000444765 in runtime.persistentalloc.func1 () at /home/lizgao/go/src/runtime/malloc.go:1143 #5 0x000000000040a632 in runtime.persistentalloc (align=, size=, sysStat=, ~r3=) at /home/lizgao/go/src/runtime/malloc.go:1142 #6 0x0000000000408f9c in runtime.(*mheap).sysAlloc (h=0x4be660 , n=67108864, size=, v=) at /home/lizgao/go/src/runtime/malloc.go:640 #7 0x000000000041c372 in runtime.(*mheap).grow (h=0x4be660 , npage=, ~r1=) at /home/lizgao/go/src/runtime/mheap.go:919 #8 0x000000000041c1f7 in runtime.(*mheap).allocSpanLocked (h=0x4be660 , npage=1, stat=0x4d6c68 , ~r2=) at /home/lizgao/go/src/runtime/mheap.go:848 #9 0x000000000041ba09 in runtime.(*mheap).alloc_m (h=0x4be660 , large=false, npage=1, spanclass=42 '*', ~r3=) at /home/lizgao/go/src/runtime/mheap.go:692 #10 0x00000000004451ac in runtime.(*mheap).alloc.func1 () at /home/lizgao/go/src/runtime/mheap.go:759 #11 0x000000000041bcaa in runtime.(*mheap).alloc (h=, large=, needzero=, npage=, spanclass=, ~r4=) at /home/lizgao/go/src/runtime/mheap.go:758 #12 0x0000000000410874 in runtime.(*mcentral).grow (c=0x4c0418 , ~r0=) at /home/lizgao/go/src/runtime/mcentral.go:232 #13 0x00000000004103c8 in runtime.(*mcentral).cacheSpan (c=0x4c0418 , ~r0=) at /home/lizgao/go/src/runtime/mcentral.go:106 #14 0x000000000040ff85 in runtime.(*mcache).refill (c=0x7ffff7fbd000, spc=42 '*') at /home/lizgao/go/src/runtime/mcache.go:118 #15 0x0000000000444692 in runtime.(*mcache).nextFree.func1 () at /home/lizgao/go/src/runtime/malloc.go:749 #16 0x0000000000409536 in runtime.(*mcache).nextFree (c=0x7ffff7fbd000, s=, shouldhelpgc=, spc=42 '*', v=) at /home/lizgao/go/src/runtime/malloc.go:748 #17 0x0000000000409e83 in runtime.mallocgc (needzero=true, size=384, typ=0x46baa0, ~r3=) at /home/lizgao/go/src/runtime/malloc.go:903 #18 0x000000000040a268 in runtime.newobject (typ=, ~r1=) at /home/lizgao/go/src/runtime/malloc.go:1032 #19 0x000000000042b411 in runtime.malg (stacksize=, ~r1=) at /home/lizgao/go/src/runtime/proc.go:3285 #20 0x0000000000420649 in runtime.mpreinit (mp=) at /home/lizgao/go/src/runtime/os_linux.go:311 #21 0x0000000000424ff1 in runtime.mcommoninit (mp=0x4ba3a0 ) at /home/lizgao/go/src/runtime/proc.go:624 #22 0x0000000000424cb9 in runtime.schedinit () at /home/lizgao/go/src/runtime/proc.go:546 #23 0x0000000000446b8a in runtime.rt0_go () at /home/lizgao/go/src/runtime/asm_amd64.s:195 #24 0x0000000000000001 in ?? () #25 0x00007fffffffe578 in ?? () #26 0x0000000000000001 in ?? () #27 0x00007fffffffe578 in ?? () #28 0x0000000000000000 in ?? () (gdb) |
@gaolizhou, thanks for the backtrace. We do understand exactly what's causing this issue. It's just an unintended consequence of a very intentional change that switched from mapping the heap in small increments to mapping it in 64MB increments. |
Do we have a fix in mind? If not, does the benefit of that change outweigh the regression in core sizes? |
Map arenas I don't think this is very hard to implement or adds much complexity, though everything in |
Change https://golang.org/cl/175497 mentions this issue: |
Just a few adjustments to handle runtime changes. Use zip files for test data so we don't use so much disk space for large core files. See golang/go#28081. Fixes golang/go#30631 Change-Id: I6a11aae56e715540fc320cbe70e426eeafa2f64d Reviewed-on: https://go-review.googlesource.com/c/debug/+/175497 Run-TryBot: Hyang-Ah Hana Kim <hyangah@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
Is there a way to switch this back to the old behavior either at compile or runtime? We have a light weight application that we had been running in a 64MB sized VM with no issue. With this change, when we build it on the newer tool chain (>= 1.11) it won't start in those small VMs. We get "fatal error: runtime: out of memory". |
@johnsom, no, there's is no way to switch this behavior. But passing |
Sadly our code does not call mlockall at all, it only uses libraries. The trace is here: `fatal error: runtime: out of memory runtime stack: |
This comment has been minimized.
This comment has been minimized.
Is this a Linux VM? If so, simply mapping more memory shouldn't cause such problems, unless you have ulimit -v set low or overcommit configured conservatively. Are either of these the case for you? It's possible you're seeing this for other reasons. For example, the RSS of your application increased with newer versions of Go, which could be a legitimate bug but distinct from anything in this issue. |
Seems like the current go syscall is not supporting |
Run with
Go 1.11 and tip generate a ~100MB core. Go 1.10 and earlier only generate a ~2MB core.
Probably related to the arena changes in 1.11.
@aclements @heschik @hyangah
The text was updated successfully, but these errors were encountered: