runtime: malloc crashes with 4TB arena #20882

pbnjay · 2017-07-02T02:48:17Z

I've made some minor changes to try to support 4TB of ram, but the increased stress seems to trigger a GC-related segfault very consistently around 750GB (graph below). Running with GOGC=off there are no issues up to ~3TB allocated (2nd graph below). I'm currently running with GOGC=30 and it's made a lot more progress.

I'll be using a pool eventually (this code is very unoptimized), so there will be less GC stress then... but I figured I'd at least report the crash just in case there's something obvious going on or this can aid testing.

This is a cluster node (64 cores available, approx 30-50 active during the crash). I have limited direct access but I can write memprofiles or other tests/data dumps if that helps.

What version of Go are you using (`go version`)?

go version devel +14b07df Fri Jun 30 23:48:06 2017 +0000 linux/amd64

What operating system and processor architecture are you using (`go env`)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/users/jjay"
GORACE=""
GOROOT="/users/jjay/src/github.com/golang/go"
GOTOOLDIR="/users/jjay/src/github.com/golang/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build333923340=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"

What did you do?

I changed runtime/malloc.go to set _MHeapMap_TotalBits=42 to support 4TB of ram. Also limited arena allocation to ensure it didn't cross 0x00007fffffffffff due to large arena size (doesn't appear to be relevant here).

What did you expect to see?

No segfault - or at least a segfault at a more predictable power-of-two total allocation.

What did you see instead?

Looks like a double-free caused after mheap_.allspans is grown after hitting ~750GB.

Relevant snippet of stack dump:

fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0xf1 pc=0x454fcb]

runtime stack:
runtime.throw(0x51855c, 0x2a)
        /users/jjay/src/github.com/golang/go/src/runtime/panic.go:605 +0x95
runtime.sigpanic()
        /users/jjay/src/github.com/golang/go/src/runtime/signal_unix.go:351 +0x2b8
runtime.sysMunmap(0x2b494827f000, 0x5c5e438, 0x2b494827f000, 0x2b497b8be000, 0x8a8d650, 0xe1617ddd40, 0x41407e, 0x2b494827f000, 0x5c5e438, 0xe1
617ddde0, ...)
        /users/jjay/src/github.com/golang/go/src/runtime/sys_linux_amd64.s:403 +0x1b
runtime.munmap(0x2b494827f000, 0x5c5e438)
        /users/jjay/src/github.com/golang/go/src/runtime/cgo_mmap.go:45 +0x79
runtime.sysFree(0x2b494827f000, 0x5c5e438, 0x5cc660)
        /users/jjay/src/github.com/golang/go/src/runtime/mem_linux.go:176 +0x3e
runtime.recordspan(0x5b3fc0, 0x2b497b3cfe48)
        /users/jjay/src/github.com/golang/go/src/runtime/mheap.go:336 +0x270
runtime.(*fixalloc).alloc(0x5b7490, 0x2b497b3cfdb0)
        /users/jjay/src/github.com/golang/go/src/runtime/mfixalloc.go:86 +0xdd
runtime.(*mheap).allocSpanLocked(0x5b3fc0, 0x16, 0x5cc608, 0x2b4963261790)
        /users/jjay/src/github.com/golang/go/src/runtime/mheap.go:820 +0x15c
runtime.(*mheap).alloc_m(0x5b3fc0, 0x16, 0xe1617d0101, 0x413b0c)
        /users/jjay/src/github.com/golang/go/src/runtime/mheap.go:661 +0x118
runtime.(*mheap).alloc.func1()
        /users/jjay/src/github.com/golang/go/src/runtime/mheap.go:728 +0x4d
runtime.systemstack(0xe1617ddf08)
        /users/jjay/src/github.com/golang/go/src/runtime/asm_amd64.s:360 +0xab
runtime.(*mheap).alloc(0x5b3fc0, 0x16, 0xe161010101, 0x413174)
        /users/jjay/src/github.com/golang/go/src/runtime/mheap.go:727 +0xa1
runtime.largeAlloc(0x2bf20, 0x2b48dc420101, 0x2b497b3cea75)
        /users/jjay/src/github.com/golang/go/src/runtime/malloc.go:824 +0x98
runtime.mallocgc.func1()
        /users/jjay/src/github.com/golang/go/src/runtime/malloc.go:719 +0x46
runtime.systemstack(0xe100058c00)
        /users/jjay/src/github.com/golang/go/src/runtime/asm_amd64.s:344 +0x79
runtime.mstart()
        /users/jjay/src/github.com/golang/go/src/runtime/proc.go:1125

goroutine 1189 [running]:
runtime.systemstack_switch()
        /users/jjay/src/github.com/golang/go/src/runtime/asm_amd64.s:298 fp=0xfb0a217a80 sp=0xfb0a217a78 pc=0x451570
runtime.mallocgc(0x2bf20, 0x4e6b20, 0xfb0a217a01, 0x17a5a142e70)
        /users/jjay/src/github.com/golang/go/src/runtime/malloc.go:718 +0x7b8 fp=0xfb0a217b28 sp=0xfb0a217a80 pc=0x40f908
runtime.makeslice(0x4e6b20, 0x57e4, 0x57e4, 0x13db3270000, 0x0, 0x729)
        /users/jjay/src/github.com/golang/go/src/runtime/slice.go:54 +0x77 fp=0xfb0a217b58 sp=0xfb0a217b28 pc=0x43d417

Default GOGC

With GOGC=off (hits OOM at the end)

The text was updated successfully, but these errors were encountered:

ALTree · 2017-07-02T08:05:45Z

What did you do?
I changed runtime/malloc.go to set _MHeapMap_TotalBits=42 to support 4TB of ram.

You made a modification in the runtime and it's causing GC crashes. I'm, hum... not sure that this is a valid bug report.

But maybe your patch is correct and it exposed a real issue, so cc @aclements in case he wants to investigate.

aclements · 2017-07-03T13:58:41Z

Well, this clearly isn't a supported configuration, but I'm happy to help debug. :) In principle it could work, but I'm not particularly surprised that it doesn't.

fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0xf1 pc=0x454fcb]

runtime stack:
runtime.throw(0x51855c, 0x2a)
        /users/jjay/src/github.com/golang/go/src/runtime/panic.go:605 +0x95
runtime.sigpanic()
        /users/jjay/src/github.com/golang/go/src/runtime/signal_unix.go:351 +0x2b8
runtime.sysMunmap(0x2b494827f000, 0x5c5e438, 0x2b494827f000, 0x2b497b8be000, 0x8a8d650, 0xe1617ddd40, 0x41407e, 0x2b494827f000, 0x5c5e438, 0xe1
617ddde0, ...)
        /users/jjay/src/github.com/golang/go/src/runtime/sys_linux_amd64.s:403 +0x1b

This isn't a double-free (munmap has no such concept), but the munmap syscall is failing. The runtime doesn't expect it to fail, so it intentionally crashes the program at a low level. Probably we should propagate that up and crash with a nicer message, but the real question is why would munmap fail? The address is aligned and the length is sane. The only other failure mode I know off the top of my head is running out of VMAs.

If you could get the error code from munmap, it might shed some light on the problem. Probably just tweak sysMunmap to return the result of the syscall unconditionally and then do the equivalent error check where it's called from Go code and print out the error code before throwing.

If it is the number of VMAs, you could just keep an eye the number of lines in /proc/PID/maps. If that exceeds the value of sysctl vm.max_map_count, then that's what's causing the failure. In this case it's still an interesting question as to why we're winding up with so many mappings, but you can just set vm.max_map_count higher to work around it.

pbnjay · 2017-07-03T16:56:19Z

Thanks for the pointer @aclements - that definitely seems to be the issue. vm.max_map_count = 65530 , and I looped printing the line count of /proc/PID/maps and it jumps from 29 to 56119 in under 5s, approximately just before the crash normally occurs (of course it doesn't crash when doing this /s).

Seems to be a one-time spike as the growth is fairly slow before and after this point. The map file basically contains 50k x 2MB regions, so 100GB worth of allocations in 5s. Probably would have crashed most systems... :) I'm guessing the GC takes a bit longer to recover all that, since adding any sort of GODEBUG or tweaking GOGC seems so mitigate the issue (GOGC=30 has 2900 maps at 70% complete, default crashes before 1%).

I'll continue debugging to see if I can find a way to reproduce consistently.

aclements · 2017-07-05T00:14:56Z

The map file basically contains 50k x 2MB regions, so 100GB worth of allocations in 5s.

Interesting. This happens to be the worst-case behavior of the scavenger on Linux. Take a look at the big comment in runtime/mem_linux.go:sysUnused for an explanation. It's not exactly that there were 100GB of allocations in 5s. Those allocations already happened. Then when the scavenger ran and returned unused memory to the OS, the runtime wound up fragmenting the flags on that range of the address space, which split it into the separate 2MB VMAs you're seeing.

If you don't care about returning memory to the OS (it'll still be reused internally by Go), you could disable the scavenger by just returning from mheap.scavenge in mheap.go.

pbnjay · 2017-07-05T03:21:08Z

Such an interesting rabbit hole, I haven't been this deep on system stuff in years... So malloc fails because it can't resize it's span list, which fails because there are too many VMAs, which are too many because the kernel hasn't cleaned up the VMAs, which were released by the scavenger on a recent GC. (and my code is terrible because it made 100GB of temporary allocations!)

For a cluster environment like ours, my code generally the only thing running on a large-memory node so there's no real reason to keep the scavenger enabled. It'd be cool to do this at runtime (I see that GODEBUG=scavenge=2 would be trivial to add), but I'm sure this is not a common use case.

Thanks so much for the help debugging and for the workaround!

aclements · 2017-07-05T14:30:15Z

which are too many because the kernel hasn't cleaned up the VMAs, which were released by the scavenger on a recent GC.

It's actually a little subtler than this. The VMAs weren't released, and there's nothing for the kernel to clean up. The way the Go runtime releases memory releases the physical memory without releasing the virtual address space, so it actually doesn't affect the VMAs (which map virtual address space). But to avoid a quirk of Linux transparent huge page support, we change the flags on the virtual address space at a 2 MB granularity. Prior to this, the entire Go heap is represented by a few huge VMAs. But the flag changes force the kernel to break up these large VMAs into smaller VMAs since a VMA only has a single set of flags associated with it. In the worst case, you get a checkerboard of flags with distinct VMA covering every 2 MB region.

Thanks so much for the help debugging and for the workaround!

Thanks for the interesting problem. :)

I'm guessing we may have to solve this for real one day, but for now, since it's a niche problem and we've identified a few workarounds, I'm going to go ahead and close this issue.

aclements · 2018-03-28T00:53:58Z

Just following up, Go tip (to be released as Go 1.11) no longer has any direct limit on the heap size.

The VMA count is likely to still be a problem for huge heaps.

ALTree added the NeedsInvestigation label Jul 2, 2017

aclements changed the title ~~runtime: malloc crashes with large memory and default GOGC~~ runtime: malloc crashes with 4TB arena Jul 3, 2017

ALTree added this to the Unplanned milestone Jul 3, 2017

aclements closed this as completed Jul 5, 2017

golang locked and limited conversation to collaborators Mar 28, 2019

gopherbot added the FrozenDueToAge label Mar 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: malloc crashes with 4TB arena #20882

runtime: malloc crashes with 4TB arena #20882

pbnjay commented Jul 2, 2017

ALTree commented Jul 2, 2017

aclements commented Jul 3, 2017

pbnjay commented Jul 3, 2017

aclements commented Jul 5, 2017

pbnjay commented Jul 5, 2017

aclements commented Jul 5, 2017

aclements commented Mar 28, 2018

runtime: malloc crashes with 4TB arena #20882

runtime: malloc crashes with 4TB arena #20882

Comments

pbnjay commented Jul 2, 2017

What version of Go are you using (go version)?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

Default GOGC

With GOGC=off (hits OOM at the end)

ALTree commented Jul 2, 2017

aclements commented Jul 3, 2017

pbnjay commented Jul 3, 2017

aclements commented Jul 5, 2017

pbnjay commented Jul 5, 2017

aclements commented Jul 5, 2017

aclements commented Mar 28, 2018

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?