Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: being unable to allocate across scavenged and unscavenged memory leads to fragmentation blow-up in some cases #35848

Open
savalin opened this issue Nov 26, 2019 · 5 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@savalin
Copy link

savalin commented Nov 26, 2019

What version of Go are you using (go version)?

$ go version
go version go1.12.13 darwin/amd64
go version go1.13.4 darwin/amd64

Does this issue reproduce with the latest release?

Yes. Tested on

go version go1.4-bootstrap-20170531 darwin/amd64

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/qwe/Library/Caches/go-build"
GOENV="/Users/qwe/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GONOPROXY="none"
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/qwe/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.13.4/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.13.4/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/qv/37vc1dv526n0drc9jljj4v240000gp/T/go-build366955304=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

Original issue description is posted at gonum library repo gonum/gonum#1174

In a few words: we have a huge graph in memory and we need to update it periodically.
After a new graph initialization, we switch pointer to the new graph and wait for GC to free unused memory.
But it never reuses all the HeapIdle and asks OS for new pages increasing HeapSys infinitely.

Base self-contained demo: https://github.com/savalin/example/blob/master/main.go.
It's simplified code but even this one shows some memory management strangeness.

What did you expect to see?

HeapIdle re-usage instead of asking OS for new memory.

What did you see instead?

HeapIdle/HeapSys grow infinitely (until OMM-killer wake up and kill the app)

Currently, we downgraded golang to v1.11.13 as a workaround, but we hope to be able to use new golang releases

@dmitshur dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 26, 2019
@dmitshur
Copy link
Contributor

/cc @aclements @mknyszek

(Please let me know if there are other people who are better to ping about an issue like this.)

@mknyszek
Copy link
Contributor

@savalin FWIW, I can't reproduce this at tip. HeapSys stays nice and low. This leads me to an interesting conclusion. Go 1.12 introduced the concept of treating memory separately if it's returned to the OS or if it isn't. This policy was carried through into Go 1.13, but in Go 1.14, we no longer do this because it would be expensive and cumbersome to do so.

The reason this decision was made in the first place was to accommodate the more aggressive mechanism for returning memory to the OS in Go 1.13 (#30333). The decision was deemed a good idea because jemalloc and other allocators follow a similar rule for returning memory to the OS. It seems like for whatever reason this decision is causing fragmentation which really hurts this application.

Anyway, it shouldn't matter for future releases. This bug is effectively fixed for the Go 1.14 release. Please give the tip of the master branch a try and see if you get the same results.

Finally, this made me realize that debug.FreeOSMemory no longer works as it's supposed to in Go 1.14, so I'll open up an issue and fix that.

CC @aclements, since this is the first consequence we're seeing of that original decision.

Please continue reading if you'd like to see some additional notes I wrote while sussing this out.

  • HeapSys is expected to never decrease since we never unmap memory in the runtime. Instead we call madvise which increases HeapReleased.
  • In Go 1.13 we added a more aggressive scavenging mechanism (runtime: smarter scavenging #30333) which should keep HeapSys - HeapReleased closer to your actual working set, not accounting for certain accounts of fragmentation.
  • I noticed in the other thread also that you tried GODEBUG=madvdontneed=1, but here you specify darwin/amd64; note that that option only has any effect on Linux, where it'll properly drop RSS counters at the cost of additional page faults (but the OS lets you overcommit, it'll just kick out MADV_FREE'd pages when it needs it). We don't do what drops RSS counters on darwin/amd64 because it's non-trivial to do, though perhaps we could now that we rely on libSystem.
  • Looking at your example, HeapIdle == HeapReleased after you call debug.FreeOSMemory, which is exactly what that does. I'm not sure why your OOM killer is triggering because HeapSys is just virtual memory. Do you have ulimit -v set? Perhaps this is a Darwin overcommit thing we need to work around.
  • There was temporarily some issues in Go 1.12/1.13 around fragmentation causing HeapSys to keep getting larger (runtime: treap implementation of find() doesn't return the best-fit span #31616) because of a bug. We also noticed that there was a problem in Go 1.12 if we decided to prefer unscavenged memory over scavenged memory in the allocator, so we decided to take the best between the two instead, which resolved the issue.

@mknyszek mknyszek changed the title HeapIdle is increasing infinitely on big data volume allocations runtime: being unable to allocate across scavenged and unscavenged memory leads to fragmentation blow-up in some cases Nov 26, 2019
@mknyszek
Copy link
Contributor

@savalin I will close this once you're able to verify that this is fixed for you. If you're still experiencing problems, we'll move on from there.

@gopherbot

This comment has been minimized.

@mknyszek mknyszek self-assigned this Nov 27, 2019
@mknyszek mknyszek removed the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 27, 2019
@mknyszek mknyszek added this to the Backlog milestone Nov 27, 2019
@mknyszek
Copy link
Contributor

mknyszek commented Dec 2, 2019

@savalin My mistake. This doesn't reproduce at tip as long as you put a runtime.GC() call at the very top of the for loop (when all the heap memory is actually dead).

I can get it to increase the same as in #35890. What @randall77 mentioned in that one is pretty much what I'm seeing here: it caps out at around 20 GiB.

@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

5 participants