Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: malloc crashes with 4TB arena #20882

Closed
pbnjay opened this issue Jul 2, 2017 · 7 comments
Closed

runtime: malloc crashes with 4TB arena #20882

pbnjay opened this issue Jul 2, 2017 · 7 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@pbnjay
Copy link
Contributor

pbnjay commented Jul 2, 2017

I've made some minor changes to try to support 4TB of ram, but the increased stress seems to trigger a GC-related segfault very consistently around 750GB (graph below). Running with GOGC=off there are no issues up to ~3TB allocated (2nd graph below). I'm currently running with GOGC=30 and it's made a lot more progress.

I'll be using a pool eventually (this code is very unoptimized), so there will be less GC stress then... but I figured I'd at least report the crash just in case there's something obvious going on or this can aid testing.

This is a cluster node (64 cores available, approx 30-50 active during the crash). I have limited direct access but I can write memprofiles or other tests/data dumps if that helps.

What version of Go are you using (go version)?

go version devel +14b07df Fri Jun 30 23:48:06 2017 +0000 linux/amd64

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/users/jjay"
GORACE=""
GOROOT="/users/jjay/src/github.com/golang/go"
GOTOOLDIR="/users/jjay/src/github.com/golang/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build333923340=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"

What did you do?

I changed runtime/malloc.go to set _MHeapMap_TotalBits=42 to support 4TB of ram. Also limited arena allocation to ensure it didn't cross 0x00007fffffffffff due to large arena size (doesn't appear to be relevant here).

What did you expect to see?

No segfault - or at least a segfault at a more predictable power-of-two total allocation.

What did you see instead?

Looks like a double-free caused after mheap_.allspans is grown after hitting ~750GB.

Relevant snippet of stack dump:

fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0xf1 pc=0x454fcb]

runtime stack:
runtime.throw(0x51855c, 0x2a)
        /users/jjay/src/github.com/golang/go/src/runtime/panic.go:605 +0x95
runtime.sigpanic()
        /users/jjay/src/github.com/golang/go/src/runtime/signal_unix.go:351 +0x2b8
runtime.sysMunmap(0x2b494827f000, 0x5c5e438, 0x2b494827f000, 0x2b497b8be000, 0x8a8d650, 0xe1617ddd40, 0x41407e, 0x2b494827f000, 0x5c5e438, 0xe1
617ddde0, ...)
        /users/jjay/src/github.com/golang/go/src/runtime/sys_linux_amd64.s:403 +0x1b
runtime.munmap(0x2b494827f000, 0x5c5e438)
        /users/jjay/src/github.com/golang/go/src/runtime/cgo_mmap.go:45 +0x79
runtime.sysFree(0x2b494827f000, 0x5c5e438, 0x5cc660)
        /users/jjay/src/github.com/golang/go/src/runtime/mem_linux.go:176 +0x3e
runtime.recordspan(0x5b3fc0, 0x2b497b3cfe48)
        /users/jjay/src/github.com/golang/go/src/runtime/mheap.go:336 +0x270
runtime.(*fixalloc).alloc(0x5b7490, 0x2b497b3cfdb0)
        /users/jjay/src/github.com/golang/go/src/runtime/mfixalloc.go:86 +0xdd
runtime.(*mheap).allocSpanLocked(0x5b3fc0, 0x16, 0x5cc608, 0x2b4963261790)
        /users/jjay/src/github.com/golang/go/src/runtime/mheap.go:820 +0x15c
runtime.(*mheap).alloc_m(0x5b3fc0, 0x16, 0xe1617d0101, 0x413b0c)
        /users/jjay/src/github.com/golang/go/src/runtime/mheap.go:661 +0x118
runtime.(*mheap).alloc.func1()
        /users/jjay/src/github.com/golang/go/src/runtime/mheap.go:728 +0x4d
runtime.systemstack(0xe1617ddf08)
        /users/jjay/src/github.com/golang/go/src/runtime/asm_amd64.s:360 +0xab
runtime.(*mheap).alloc(0x5b3fc0, 0x16, 0xe161010101, 0x413174)
        /users/jjay/src/github.com/golang/go/src/runtime/mheap.go:727 +0xa1
runtime.largeAlloc(0x2bf20, 0x2b48dc420101, 0x2b497b3cea75)
        /users/jjay/src/github.com/golang/go/src/runtime/malloc.go:824 +0x98
runtime.mallocgc.func1()
        /users/jjay/src/github.com/golang/go/src/runtime/malloc.go:719 +0x46
runtime.systemstack(0xe100058c00)
        /users/jjay/src/github.com/golang/go/src/runtime/asm_amd64.s:344 +0x79
runtime.mstart()
        /users/jjay/src/github.com/golang/go/src/runtime/proc.go:1125

goroutine 1189 [running]:
runtime.systemstack_switch()
        /users/jjay/src/github.com/golang/go/src/runtime/asm_amd64.s:298 fp=0xfb0a217a80 sp=0xfb0a217a78 pc=0x451570
runtime.mallocgc(0x2bf20, 0x4e6b20, 0xfb0a217a01, 0x17a5a142e70)
        /users/jjay/src/github.com/golang/go/src/runtime/malloc.go:718 +0x7b8 fp=0xfb0a217b28 sp=0xfb0a217a80 pc=0x40f908
runtime.makeslice(0x4e6b20, 0x57e4, 0x57e4, 0x13db3270000, 0x0, 0x729)
        /users/jjay/src/github.com/golang/go/src/runtime/slice.go:54 +0x77 fp=0xfb0a217b58 sp=0xfb0a217b28 pc=0x43d417

Default GOGC

image

With GOGC=off (hits OOM at the end)

image

@ALTree
Copy link
Member

ALTree commented Jul 2, 2017

What did you do?
I changed runtime/malloc.go to set _MHeapMap_TotalBits=42 to support 4TB of ram.

You made a modification in the runtime and it's causing GC crashes. I'm, hum... not sure that this is a valid bug report.

But maybe your patch is correct and it exposed a real issue, so cc @aclements in case he wants to investigate.

@ALTree ALTree added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jul 2, 2017
@aclements
Copy link
Member

Well, this clearly isn't a supported configuration, but I'm happy to help debug. :) In principle it could work, but I'm not particularly surprised that it doesn't.

fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0xf1 pc=0x454fcb]

runtime stack:
runtime.throw(0x51855c, 0x2a)
        /users/jjay/src/github.com/golang/go/src/runtime/panic.go:605 +0x95
runtime.sigpanic()
        /users/jjay/src/github.com/golang/go/src/runtime/signal_unix.go:351 +0x2b8
runtime.sysMunmap(0x2b494827f000, 0x5c5e438, 0x2b494827f000, 0x2b497b8be000, 0x8a8d650, 0xe1617ddd40, 0x41407e, 0x2b494827f000, 0x5c5e438, 0xe1
617ddde0, ...)
        /users/jjay/src/github.com/golang/go/src/runtime/sys_linux_amd64.s:403 +0x1b

This isn't a double-free (munmap has no such concept), but the munmap syscall is failing. The runtime doesn't expect it to fail, so it intentionally crashes the program at a low level. Probably we should propagate that up and crash with a nicer message, but the real question is why would munmap fail? The address is aligned and the length is sane. The only other failure mode I know off the top of my head is running out of VMAs.

If you could get the error code from munmap, it might shed some light on the problem. Probably just tweak sysMunmap to return the result of the syscall unconditionally and then do the equivalent error check where it's called from Go code and print out the error code before throwing.

If it is the number of VMAs, you could just keep an eye the number of lines in /proc/PID/maps. If that exceeds the value of sysctl vm.max_map_count, then that's what's causing the failure. In this case it's still an interesting question as to why we're winding up with so many mappings, but you can just set vm.max_map_count higher to work around it.

@aclements aclements changed the title runtime: malloc crashes with large memory and default GOGC runtime: malloc crashes with 4TB arena Jul 3, 2017
@ALTree ALTree added this to the Unplanned milestone Jul 3, 2017
@pbnjay
Copy link
Contributor Author

pbnjay commented Jul 3, 2017

Thanks for the pointer @aclements - that definitely seems to be the issue. vm.max_map_count = 65530 , and I looped printing the line count of /proc/PID/maps and it jumps from 29 to 56119 in under 5s, approximately just before the crash normally occurs (of course it doesn't crash when doing this /s).

Seems to be a one-time spike as the growth is fairly slow before and after this point. The map file basically contains 50k x 2MB regions, so 100GB worth of allocations in 5s. Probably would have crashed most systems... :) I'm guessing the GC takes a bit longer to recover all that, since adding any sort of GODEBUG or tweaking GOGC seems so mitigate the issue (GOGC=30 has 2900 maps at 70% complete, default crashes before 1%).

I'll continue debugging to see if I can find a way to reproduce consistently.

@aclements
Copy link
Member

The map file basically contains 50k x 2MB regions, so 100GB worth of allocations in 5s.

Interesting. This happens to be the worst-case behavior of the scavenger on Linux. Take a look at the big comment in runtime/mem_linux.go:sysUnused for an explanation. It's not exactly that there were 100GB of allocations in 5s. Those allocations already happened. Then when the scavenger ran and returned unused memory to the OS, the runtime wound up fragmenting the flags on that range of the address space, which split it into the separate 2MB VMAs you're seeing.

If you don't care about returning memory to the OS (it'll still be reused internally by Go), you could disable the scavenger by just returning from mheap.scavenge in mheap.go.

@pbnjay
Copy link
Contributor Author

pbnjay commented Jul 5, 2017

Such an interesting rabbit hole, I haven't been this deep on system stuff in years... So malloc fails because it can't resize it's span list, which fails because there are too many VMAs, which are too many because the kernel hasn't cleaned up the VMAs, which were released by the scavenger on a recent GC. (and my code is terrible because it made 100GB of temporary allocations!)

For a cluster environment like ours, my code generally the only thing running on a large-memory node so there's no real reason to keep the scavenger enabled. It'd be cool to do this at runtime (I see that GODEBUG=scavenge=2 would be trivial to add), but I'm sure this is not a common use case.

Thanks so much for the help debugging and for the workaround!

@aclements
Copy link
Member

which are too many because the kernel hasn't cleaned up the VMAs, which were released by the scavenger on a recent GC.

It's actually a little subtler than this. The VMAs weren't released, and there's nothing for the kernel to clean up. The way the Go runtime releases memory releases the physical memory without releasing the virtual address space, so it actually doesn't affect the VMAs (which map virtual address space). But to avoid a quirk of Linux transparent huge page support, we change the flags on the virtual address space at a 2 MB granularity. Prior to this, the entire Go heap is represented by a few huge VMAs. But the flag changes force the kernel to break up these large VMAs into smaller VMAs since a VMA only has a single set of flags associated with it. In the worst case, you get a checkerboard of flags with distinct VMA covering every 2 MB region.

Thanks so much for the help debugging and for the workaround!

Thanks for the interesting problem. :)

I'm guessing we may have to solve this for real one day, but for now, since it's a niche problem and we've identified a few workarounds, I'm going to go ahead and close this issue.

@aclements
Copy link
Member

Just following up, Go tip (to be released as Go 1.11) no longer has any direct limit on the heap size.

The VMA count is likely to still be a problem for huge heaps.

@golang golang locked and limited conversation to collaborators Mar 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

4 participants