runtime: being unable to allocate across scavenged and unscavenged memory leads to fragmentation blow-up in some cases #35848

savalin · 2019-11-26T16:24:53Z

What version of Go are you using (`go version`)?

$ go version
go version go1.12.13 darwin/amd64
go version go1.13.4 darwin/amd64

Does this issue reproduce with the latest release?

Yes. Tested on

go version go1.4-bootstrap-20170531 darwin/amd64

What operating system and processor architecture are you using (`go env`)?

go env Output

$ go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/qwe/Library/Caches/go-build"
GOENV="/Users/qwe/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GONOPROXY="none"
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/qwe/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.13.4/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.13.4/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/qv/37vc1dv526n0drc9jljj4v240000gp/T/go-build366955304=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

Original issue description is posted at gonum library repo gonum/gonum#1174

In a few words: we have a huge graph in memory and we need to update it periodically.
After a new graph initialization, we switch pointer to the new graph and wait for GC to free unused memory.
But it never reuses all the HeapIdle and asks OS for new pages increasing HeapSys infinitely.

Base self-contained demo: https://github.com/savalin/example/blob/master/main.go.
It's simplified code but even this one shows some memory management strangeness.

What did you expect to see?

HeapIdle re-usage instead of asking OS for new memory.

What did you see instead?

HeapIdle/HeapSys grow infinitely (until OMM-killer wake up and kill the app)

Currently, we downgraded golang to v1.11.13 as a workaround, but we hope to be able to use new golang releases

The text was updated successfully, but these errors were encountered:

dmitshur · 2019-11-26T20:00:50Z

/cc @aclements @mknyszek

(Please let me know if there are other people who are better to ping about an issue like this.)

mknyszek · 2019-11-26T21:07:08Z

@savalin FWIW, I can't reproduce this at tip. HeapSys stays nice and low. This leads me to an interesting conclusion. Go 1.12 introduced the concept of treating memory separately if it's returned to the OS or if it isn't. This policy was carried through into Go 1.13, but in Go 1.14, we no longer do this because it would be expensive and cumbersome to do so.

The reason this decision was made in the first place was to accommodate the more aggressive mechanism for returning memory to the OS in Go 1.13 (#30333). The decision was deemed a good idea because jemalloc and other allocators follow a similar rule for returning memory to the OS. It seems like for whatever reason this decision is causing fragmentation which really hurts this application.

Anyway, it shouldn't matter for future releases. This bug is effectively fixed for the Go 1.14 release. Please give the tip of the master branch a try and see if you get the same results.

Finally, this made me realize that debug.FreeOSMemory no longer works as it's supposed to in Go 1.14, so I'll open up an issue and fix that.

CC @aclements, since this is the first consequence we're seeing of that original decision.

Please continue reading if you'd like to see some additional notes I wrote while sussing this out.

HeapSys is expected to never decrease since we never unmap memory in the runtime. Instead we call madvise which increases HeapReleased.
In Go 1.13 we added a more aggressive scavenging mechanism (runtime: smarter scavenging #30333) which should keep HeapSys - HeapReleased closer to your actual working set, not accounting for certain accounts of fragmentation.
I noticed in the other thread also that you tried GODEBUG=madvdontneed=1, but here you specify darwin/amd64; note that that option only has any effect on Linux, where it'll properly drop RSS counters at the cost of additional page faults (but the OS lets you overcommit, it'll just kick out MADV_FREE'd pages when it needs it). We don't do what drops RSS counters on darwin/amd64 because it's non-trivial to do, though perhaps we could now that we rely on libSystem.
Looking at your example, HeapIdle == HeapReleased after you call debug.FreeOSMemory, which is exactly what that does. I'm not sure why your OOM killer is triggering because HeapSys is just virtual memory. Do you have ulimit -v set? Perhaps this is a Darwin overcommit thing we need to work around.
There was temporarily some issues in Go 1.12/1.13 around fragmentation causing HeapSys to keep getting larger (runtime: treap implementation of find() doesn't return the best-fit span #31616) because of a bug. We also noticed that there was a problem in Go 1.12 if we decided to prefer unscavenged memory over scavenged memory in the allocator, so we decided to take the best between the two instead, which resolved the issue.

mknyszek · 2019-11-26T21:13:28Z

@savalin I will close this once you're able to verify that this is fixed for you. If you're still experiencing problems, we'll move on from there.

mknyszek · 2019-12-02T22:45:50Z

@savalin My mistake. This doesn't reproduce at tip as long as you put a runtime.GC() call at the very top of the for loop (when all the heap memory is actually dead).

I can get it to increase the same as in #35890. What @randall77 mentioned in that one is pretty much what I'm seeing here: it caps out at around 20 GiB.

dmitshur added the NeedsInvestigation label Nov 26, 2019

mknyszek mentioned this issue Nov 26, 2019

runtime: debug.FreeOSMemory not freeing memory #35858

Closed

mknyszek changed the title ~~HeapIdle is increasing infinitely on big data volume allocations~~ runtime: being unable to allocate across scavenged and unscavenged memory leads to fragmentation blow-up in some cases Nov 26, 2019

This comment has been minimized.

Sign in to view

mknyszek self-assigned this Nov 27, 2019

mknyszek removed the NeedsInvestigation label Nov 27, 2019

mknyszek added this to the Backlog milestone Nov 27, 2019

savalin mentioned this issue Nov 28, 2019

runtime: HeapSys increases until OOM termination #35890

Open

rsc unassigned mknyszek Jun 23, 2022

cagedmantis added the NeedsInvestigation label Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: being unable to allocate across scavenged and unscavenged memory leads to fragmentation blow-up in some cases #35848

runtime: being unable to allocate across scavenged and unscavenged memory leads to fragmentation blow-up in some cases #35848

savalin commented Nov 26, 2019

dmitshur commented Nov 26, 2019

mknyszek commented Nov 26, 2019

mknyszek commented Nov 26, 2019

This comment has been minimized.

mknyszek commented Dec 2, 2019

runtime: being unable to allocate across scavenged and unscavenged memory leads to fragmentation blow-up in some cases #35848

runtime: being unable to allocate across scavenged and unscavenged memory leads to fragmentation blow-up in some cases #35848

Comments

savalin commented Nov 26, 2019

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

dmitshur commented Nov 26, 2019

mknyszek commented Nov 26, 2019

mknyszek commented Nov 26, 2019

This comment has been minimized.

mknyszek commented Dec 2, 2019

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?