-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: malloc crashes with 4TB arena #20882
Comments
You made a modification in the runtime and it's causing GC crashes. I'm, hum... not sure that this is a valid bug report. But maybe your patch is correct and it exposed a real issue, so cc @aclements in case he wants to investigate. |
Well, this clearly isn't a supported configuration, but I'm happy to help debug. :) In principle it could work, but I'm not particularly surprised that it doesn't.
This isn't a double-free (munmap has no such concept), but the munmap syscall is failing. The runtime doesn't expect it to fail, so it intentionally crashes the program at a low level. Probably we should propagate that up and crash with a nicer message, but the real question is why would munmap fail? The address is aligned and the length is sane. The only other failure mode I know off the top of my head is running out of VMAs. If you could get the error code from munmap, it might shed some light on the problem. Probably just tweak sysMunmap to return the result of the syscall unconditionally and then do the equivalent error check where it's called from Go code and print out the error code before throwing. If it is the number of VMAs, you could just keep an eye the number of lines in |
Thanks for the pointer @aclements - that definitely seems to be the issue. Seems to be a one-time spike as the growth is fairly slow before and after this point. The map file basically contains 50k x 2MB regions, so 100GB worth of allocations in 5s. Probably would have crashed most systems... :) I'm guessing the GC takes a bit longer to recover all that, since adding any sort of GODEBUG or tweaking GOGC seems so mitigate the issue (GOGC=30 has 2900 maps at 70% complete, default crashes before 1%). I'll continue debugging to see if I can find a way to reproduce consistently. |
Interesting. This happens to be the worst-case behavior of the scavenger on Linux. Take a look at the big comment in runtime/mem_linux.go:sysUnused for an explanation. It's not exactly that there were 100GB of allocations in 5s. Those allocations already happened. Then when the scavenger ran and returned unused memory to the OS, the runtime wound up fragmenting the flags on that range of the address space, which split it into the separate 2MB VMAs you're seeing. If you don't care about returning memory to the OS (it'll still be reused internally by Go), you could disable the scavenger by just returning from mheap.scavenge in mheap.go. |
Such an interesting rabbit hole, I haven't been this deep on system stuff in years... So malloc fails because it can't resize it's span list, which fails because there are too many VMAs, which are too many because the kernel hasn't cleaned up the VMAs, which were released by the scavenger on a recent GC. (and my code is terrible because it made 100GB of temporary allocations!) For a cluster environment like ours, my code generally the only thing running on a large-memory node so there's no real reason to keep the scavenger enabled. It'd be cool to do this at runtime (I see that Thanks so much for the help debugging and for the workaround! |
It's actually a little subtler than this. The VMAs weren't released, and there's nothing for the kernel to clean up. The way the Go runtime releases memory releases the physical memory without releasing the virtual address space, so it actually doesn't affect the VMAs (which map virtual address space). But to avoid a quirk of Linux transparent huge page support, we change the flags on the virtual address space at a 2 MB granularity. Prior to this, the entire Go heap is represented by a few huge VMAs. But the flag changes force the kernel to break up these large VMAs into smaller VMAs since a VMA only has a single set of flags associated with it. In the worst case, you get a checkerboard of flags with distinct VMA covering every 2 MB region.
Thanks for the interesting problem. :) I'm guessing we may have to solve this for real one day, but for now, since it's a niche problem and we've identified a few workarounds, I'm going to go ahead and close this issue. |
Just following up, Go tip (to be released as Go 1.11) no longer has any direct limit on the heap size. The VMA count is likely to still be a problem for huge heaps. |
I've made some minor changes to try to support 4TB of ram, but the increased stress seems to trigger a GC-related segfault very consistently around 750GB (graph below). Running with GOGC=off there are no issues up to ~3TB allocated (2nd graph below). I'm currently running with GOGC=30 and it's made a lot more progress.
I'll be using a pool eventually (this code is very unoptimized), so there will be less GC stress then... but I figured I'd at least report the crash just in case there's something obvious going on or this can aid testing.
This is a cluster node (64 cores available, approx 30-50 active during the crash). I have limited direct access but I can write memprofiles or other tests/data dumps if that helps.
What version of Go are you using (
go version
)?go version devel +14b07df Fri Jun 30 23:48:06 2017 +0000 linux/amd64
What operating system and processor architecture are you using (
go env
)?What did you do?
I changed
runtime/malloc.go
to set_MHeapMap_TotalBits=42
to support 4TB of ram. Also limited arena allocation to ensure it didn't cross0x00007fffffffffff
due to large arena size (doesn't appear to be relevant here).What did you expect to see?
No segfault - or at least a segfault at a more predictable power-of-two total allocation.
What did you see instead?
Looks like a double-free caused after
mheap_.allspans
is grown after hitting ~750GB.Relevant snippet of stack dump:
Default GOGC
With GOGC=off (hits OOM at the end)
The text was updated successfully, but these errors were encountered: