Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: OOM when runing in sparse available memory system #39400

Open
jeremyshisong opened this issue Jun 4, 2020 · 15 comments
Open

runtime: OOM when runing in sparse available memory system #39400

jeremyshisong opened this issue Jun 4, 2020 · 15 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@jeremyshisong
Copy link

What version of Go are you using (go version)?

$ go version
go version go1.11 linux/amd64

Does this issue reproduce with the latest release?

YES

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/data/golang"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go1.11"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go1.11/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build473055106=/tmp/go-build -gno-record-gcc-switches"

What did you do?

When I use go1.11 to compile elastic metricbeat with tags v7.0.0,the binary file would run out of memory when the system available memory is sparse.

The system memory useage is like this:

free -m
              total        used        free      shared  buff/cache   available
Mem:           1868        1686          14          97         167          43
Swap:             0           0           0

What did you expect to see?

The system available memory is sparse, But Metricbeat memory usage would less then 10MB when it started. So It could be started.

What did you see instead?

It can't start instead of oom.

The oom runtime stack logs are:

fatal error: runtime: out of memory

runtime stack:
runtime.throw(0x283d1da, 0x16)
        /usr/local/go1.11/src/runtime/panic.go:608 +0x72 fp=0x7ffc2108ab90 sp=0x7ffc2108ab60 pc=0x11123e2
runtime.sysMap(0xc000000000, 0x4000000, 0x3e26558)
        /usr/local/go1.11/src/runtime/mem_linux.go:156 +0xc7 fp=0x7ffc2108abd0 sp=0x7ffc2108ab90 pc=0x10fd687
runtime.(*mheap).sysAlloc(0x3e0c560, 0x4000000, 0x0, 0x0)
        /usr/local/go1.11/src/runtime/malloc.go:619 +0x1c7 fp=0x7ffc2108ac58 sp=0x7ffc2108abd0 pc=0x10f14d7
runtime.(*mheap).grow(0x3e0c560, 0x1, 0x0)
        /usr/local/go1.11/src/runtime/mheap.go:920 +0x42 fp=0x7ffc2108acb0 sp=0x7ffc2108ac58 pc=0x11099c2
runtime.(*mheap).allocSpanLocked(0x3e0c560, 0x1, 0x3e26568, 0x0)
        /usr/local/go1.11/src/runtime/mheap.go:848 +0x337 fp=0x7ffc2108acf0 sp=0x7ffc2108acb0 pc=0x1109847
runtime.(*mheap).alloc_m(0x3e0c560, 0x1, 0x2a, 0x0)
        /usr/local/go1.11/src/runtime/mheap.go:692 +0x119 fp=0x7ffc2108ad30 sp=0x7ffc2108acf0 pc=0x1109059
runtime.(*mheap).alloc.func1()
        /usr/local/go1.11/src/runtime/mheap.go:759 +0x4c fp=0x7ffc2108ad68 sp=0x7ffc2108ad30 pc=0x113ef1c
runtime.(*mheap).alloc(0x3e0c560, 0x1, 0x101002a, 0x7ffc2108add0)
        /usr/local/go1.11/src/runtime/mheap.go:758 +0x8a fp=0x7ffc2108adb8 sp=0x7ffc2108ad68 pc=0x11092fa
runtime.(*mcentral).grow(0x3e0e318, 0x0)
        /usr/local/go1.11/src/runtime/mcentral.go:232 +0x94 fp=0x7ffc2108ae00 sp=0x7ffc2108adb8 pc=0x10fd084
runtime.(*mcentral).cacheSpan(0x3e0e318, 0x0)
        /usr/local/go1.11/src/runtime/mcentral.go:106 +0x2f8 fp=0x7ffc2108ae48 sp=0x7ffc2108ae00 pc=0x10fcbd8
runtime.(*mcache).refill(0x7f237641b000, 0x220000002a)
        /usr/local/go1.11/src/runtime/mcache.go:122 +0x95 fp=0x7ffc2108ae78 sp=0x7ffc2108ae48 pc=0x10fc795
runtime.(*mcache).nextFree.func1()
        /usr/local/go1.11/src/runtime/malloc.go:749 +0x32 fp=0x7ffc2108ae98 sp=0x7ffc2108ae78 pc=0x113e332
runtime.(*mcache).nextFree(0x7f237641b000, 0x3e2652a, 0x4000, 0x7f237641b000, 0x7ffc2108af58)
        /usr/local/go1.11/src/runtime/malloc.go:748 +0xb6 fp=0x7ffc2108aef0 sp=0x7ffc2108ae98 pc=0x10f1b86
runtime.mallocgc(0x180, 0x2807a20, 0x7ffc2108af01, 0x7f237641f000)
        /usr/local/go1.11/src/runtime/malloc.go:903 +0x793 fp=0x7ffc2108af90 sp=0x7ffc2108aef0 pc=0x10f24d3
runtime.newobject(0x2807a20, 0x3e265c0)
        /usr/local/go1.11/src/runtime/malloc.go:1032 +0x38 fp=0x7ffc2108afc0 sp=0x7ffc2108af90 pc=0x10f28b8
runtime.malg(0x7f2300008000, 0x7f237641b000)
        /usr/local/go1.11/src/runtime/proc.go:3285 +0x31 fp=0x7ffc2108b000 sp=0x7ffc2108afc0 pc=0x111c661
runtime.mpreinit(0x3e06840)
        /usr/local/go1.11/src/runtime/os_linux.go:311 +0x29 fp=0x7ffc2108b020 sp=0x7ffc2108b000 pc=0x1110819
runtime.mcommoninit(0x3e06840)
        /usr/local/go1.11/src/runtime/proc.go:624 +0xc1 fp=0x7ffc2108b058 sp=0x7ffc2108b020 pc=0x1115f71
runtime.schedinit()
        /usr/local/go1.11/src/runtime/proc.go:546 +0x89 fp=0x7ffc2108b0c0 sp=0x7ffc2108b058 pc=0x1115c39
runtime.rt0_go(0x7ffc2108b1c8, 0x1, 0x7ffc2108b1c8, 0x0, 0x7f2375b95c05, 0x2000000000, 0x7ffc2108b1c8, 0x100000000, 0x1140e50, 0x0, ...)
        /usr/local/go1.11/src/runtime/asm_amd64.s:195 +0x11a fp=0x7ffc2108b0c8 sp=0x7ffc2108b0c0 pc=0x1140f7a

BUT
When I use go1.10.8 to build the same elastic metricbeat code, It started without oom and the system available memory is also about 40 MB left.

Is there any breaking changes start from go version 1.11?

@davecheney
Copy link
Contributor

Go 1.11 is no longer supported, http://golang.org/s/release. Can you please re test with Go 1.14.

@davecheney davecheney added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Jun 4, 2020
@jeremyshisong
Copy link
Author

@davecheney Thanks for your reply,I have tested with the latest release Go1.14.4. the result is same.

The runtime stack logs are:

fatal error: runtime: out of memory

runtime stack:
runtime.throw(0x26d9370, 0x16)
        /usr/local/go1.14.4/src/runtime/panic.go:1116 +0x72 fp=0x7fff2e91e338 sp=0x7fff2e91e308 pc=0x1034092
runtime.sysMap(0xc000000000, 0x4000000, 0x3afca58)
        /usr/local/go1.14.4/src/runtime/mem_linux.go:169 +0xc5 fp=0x7fff2e91e378 sp=0x7fff2e91e338 pc=0x1017d55
runtime.(*mheap).sysAlloc(0x3ae7920, 0x400000, 0x0, 0x0)
        /usr/local/go1.14.4/src/runtime/malloc.go:715 +0x1cd fp=0x7fff2e91e420 sp=0x7fff2e91e378 pc=0x100b17d
runtime.(*mheap).grow(0x3ae7920, 0x1, 0x0)
        /usr/local/go1.14.4/src/runtime/mheap.go:1286 +0x11c fp=0x7fff2e91e488 sp=0x7fff2e91e420 pc=0x102600c
runtime.(*mheap).allocSpan(0x3ae7920, 0x1, 0xd352a00, 0x3afca68, 0x1)
        /usr/local/go1.14.4/src/runtime/mheap.go:1124 +0x6a0 fp=0x7fff2e91e508 sp=0x7fff2e91e488 pc=0x1025d50
runtime.(*mheap).alloc.func1()
        /usr/local/go1.14.4/src/runtime/mheap.go:871 +0x64 fp=0x7fff2e91e560 sp=0x7fff2e91e508 pc=0x10630e4
runtime.(*mheap).alloc(0x3ae7920, 0x1, 0x12a, 0x1)
        /usr/local/go1.14.4/src/runtime/mheap.go:865 +0x81 fp=0x7fff2e91e5b0 sp=0x7fff2e91e560 pc=0x10252b1
runtime.(*mcentral).grow(0x3af87f8, 0x0)
        /usr/local/go1.14.4/src/runtime/mcentral.go:255 +0x79 fp=0x7fff2e91e5f0 sp=0x7fff2e91e5b0 pc=0x1017779
runtime.(*mcentral).cacheSpan(0x3af87f8, 0x0)
        /usr/local/go1.14.4/src/runtime/mcentral.go:106 +0x2bc fp=0x7fff2e91e638 sp=0x7fff2e91e5f0 pc=0x10172ac
runtime.(*mcache).refill(0x7f8695204108, 0x2a)
        /usr/local/go1.14.4/src/runtime/mcache.go:138 +0x85 fp=0x7fff2e91e658 sp=0x7fff2e91e638 pc=0x1016d95
runtime.(*mcache).nextFree(0x7f8695204108, 0x7f869520412a, 0xfffffffffffffff8, 0x7fff2e91e6e0, 0x7fff2e91e708)
        /usr/local/go1.14.4/src/runtime/malloc.go:868 +0x87 fp=0x7fff2e91e690 sp=0x7fff2e91e658 pc=0x100b9a7
runtime.mallocgc(0x180, 0x26a4580, 0x7fff2e91e701, 0x100cb42)
        /usr/local/go1.14.4/src/runtime/malloc.go:1036 +0x793 fp=0x7fff2e91e730 sp=0x7fff2e91e690 pc=0x100c2e3
runtime.newobject(0x26a4580, 0x4000)
        /usr/local/go1.14.4/src/runtime/malloc.go:1165 +0x38 fp=0x7fff2e91e760 sp=0x7fff2e91e730 pc=0x100c6d8
runtime.malg(0x8000, 0x3afcac0)
        /usr/local/go1.14.4/src/runtime/proc.go:3360 +0x31 fp=0x7fff2e91e7a0 sp=0x7fff2e91e760 pc=0x103f3a1
runtime.mpreinit(0x3acfc40)
        /usr/local/go1.14.4/src/runtime/os_linux.go:339 +0x2d fp=0x7fff2e91e7c0 sp=0x7fff2e91e7a0 pc=0x103119d
runtime.mcommoninit(0x3acfc40)
        /usr/local/go1.14.4/src/runtime/proc.go:630 +0x108 fp=0x7fff2e91e808 sp=0x7fff2e91e7c0 pc=0x1038898
runtime.schedinit()
        /usr/local/go1.14.4/src/runtime/proc.go:547 +0x95 fp=0x7fff2e91e860 sp=0x7fff2e91e808 pc=0x10384b5
runtime.rt0_go(0x7fff2e91e968, 0x1, 0x7fff2e91e968, 0x0, 0x7f869497ec05, 0x2000000000, 0x7fff2e91e968, 0x100000000, 0x1065850, 0x0, ...)
        /usr/local/go1.14.4/src/runtime/asm_amd64.s:214 +0x125 fp=0x7fff2e91e868 sp=0x7fff2e91e860 pc=0x1065985

@davecheney
Copy link
Contributor

Thank you for confirming that the problem occurs with Go 1.14.

As an experiment, are you able to add a small amount of swap to the system, and verify if the problem continues to occur?

@jeremyshisong
Copy link
Author

@davecheney After created about 100MB swap spaces, and set swappiness=0 to avoid swapping whenever possible,it can run successfully without oom.

Here is system memory usage:

free -m
              total        used        free      shared  buff/cache   available
Mem:           1868        1772          15          23          80          36
Swap:           102         102           0

@davecheney
Copy link
Contributor

ping @mknyszek

@mknyszek
Copy link
Contributor

mknyszek commented Jun 4, 2020

As of Go 1.11, heap arenas are mapped in 64 MiB increments on linux/amd64 (golang.org/cl/85887). Looking at the stack trace, it looks there's a failure to map the first arena (as read-write). In theory this shouldn't happen with default settings on Linux because we don't immediately go and touch all of those 64 MiB, i.e we map it but we don't use it unless we actually need it. Given that it's the actual mmap call that's failing, it looks like the OS is just telling us we can't map that memory for whatever reason.

One reason this could be is overcommit rules, or virtual address space limits. Can you show me the output of

$ cat /proc/sys/vm/overcommit_memory
$ cat /proc/sys/vm/overcommit_ratio
$ ulimit -v

on your system? Also, what version of Linux are you running on?

A work-around here could be to compile for GOARCH=386 where we use 4 MiB arenas; you should be able to map at least 2 arenas before Linux thinks you're out of memory. It will likely mean slower code overall (bit intrinsics are missing for example, and they're used on a bunch of critical paths in the runtime, also 32-bit registers instead of 64-bit) but your code should work.

@odeke-em odeke-em changed the title go1.11 compiled elastic metricbeat result in oom when runing in sparse available memory system runtime: OOM when runing in sparse available memory system Jun 5, 2020
@jeremyshisong
Copy link
Author

@mknyszek Thanks for your detailed explanation.

The outpout of system configs are as follows:

$ cat /proc/sys/vm/overcommit_memory
0
$ cat /proc/sys/vm/overcommit_ratio
50
$ ulimit -v
unlimited

The version of Linux is 3.10.107

It looks like overcommit rules forbidded mapping the 64 MiB heap arenas.

As an experiment,when setting overcommit_memory to 1, It worked without OOM.

The work-around to compile for GOARCH=386 is also worked,as your explanation It will likely mean slower code overall,may be there is another better choice.

The 64 MiB heap arenas is hardcode or it changeable via a compiler flag?

@networkimprov
Copy link

@gopherbot remove WaitingForInfo

@gopherbot gopherbot removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Jun 5, 2020
@dmitshur dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 5, 2020
@dmitshur dmitshur added this to the Backlog milestone Jun 5, 2020
@laoshaw
Copy link

laoshaw commented Jan 14, 2021

run a helloworld net/http on mips32 it will ask for 800MB VSS(my board only has 128MB physical RAM), if set overcommit_memory to 2(i.e. no overcommit), the system will immediately become unusable (can't fork: out of memory) when I run 20+ of them in parallel. Comparatively I can run many more C/c++ http processes in parallel, each of them only takes 4MB or so for VSS instead of 800MB(on the RSS side, it's 4MB vs 1MB but the issue concerned here is VSS)

Can we at least make golang configurable for this huge VSS by default? so golang can run well on resource restricted linux systems? tinygo is for MCU but I'm talking about the standard golang for embedded boards with 64MB/128M/256M RAM here. embedded systems sometimes set overcommit_memory to 2 and swappiness to 0 to make sure the app will run into OOM for a very long time in the field.

@networkimprov
Copy link

@ianlancetaylor isn't there a proposal for what @laoshaw describes?

@ianlancetaylor
Copy link
Contributor

Maybe, but I couldn't find it.

See also #38010.

@networkimprov
Copy link

@laoshaw to suggest a configuration feature, file a new issue named "proposal: [summary]" that describes the problem and how you'd like to solve it.

@laoshaw
Copy link

laoshaw commented Jan 14, 2021

I actually don't know how to write a good proposal yet, still new to golang, just did some quick comparison to see if it's feasible to shift future devel to golang, the embedded boards I mentioned actually is probably as widely deployed as golang's microservice installation base in the cloud. All I want is to have reasonable VSS and set it super high only when you're on a resource rich server environment and absolutely necessary for that. The default value should not that huge at least. Or let me pass a runtime env flag to adjust my own VSS size that can overwrite whatever Golang feels the best? I tried 'ulimit -v' and it does not work either.

@networkimprov
Copy link

Pls at least file a separate bug report describing your problem.

@laoshaw
Copy link

laoshaw commented Jan 14, 2021

yes it's filed here: #43699

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

8 participants