Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory not being reclaimed by OS #41444

Closed
Pungyeon opened this issue Sep 17, 2020 · 11 comments
Closed

Memory not being reclaimed by OS #41444

Pungyeon opened this issue Sep 17, 2020 · 11 comments
Labels
FrozenDueToAge WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.

Comments

@Pungyeon
Copy link

What version of Go are you using (go version)?

We have reproduced this behaviour with 1.13, 1.14 & 1.15

$ go version
go version go1.15.2 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

We have tried several different UNIX distributions, mainly ubuntu and alpine. However, they all behave alike.

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build014469751=/tmp/go-build -gno-record-gcc-switches"

What did you do?

We have written an application for transforming VoIP traffic from the network, using the gopacket library. The application parses VoIP streams and decodes the associated RTP traffic. All data associated with ongoing calls is stored in memory and all completed calls are stored on disk. However, when sending traffic through our application, we have noticed that memory is never reclaimed by the OS. We initially thought that this was a memory leak but after extensive investigation concluded that it is not.

We stumbled upon the following article, which described similar behaviour: https://blog.detectify.com/2019/09/05/how-we-tracked-down-a-memory-leak-in-one-of-our-go-microservices/

We tried implementing a similar fix, and it actually seemed to work. Using the environment variable GODEBUG=madvdontneed=1 together with running debug.FreeOSMemory enabled the OS to reclaim the memory and avoiding an eventual OOM error.

I unfortunately cannot share the code at this current time, since it is still proprietary software 😢 But I may be able to share a video demonstration of this exact issue, if this would help in any way.

What did you expect to see?

Once a call session is over, all data is cleared from memory. We expected to see the OS reclaiming this memory.

What did you see instead?

The OS never reclaims this memory and we see a steady increase in memory usage of our application (without the fix described above).

@davecheney
Copy link
Contributor

The OS never reclaims this memory and we see a steady increase in memory usage of our application (without the fix described above).

Can you run your application with GODEBUG=gctrace=1 up to the point it is killed by the operating system and paste the complete output in this issue. Thank you.

@davecheney davecheney added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Sep 17, 2020
@Pungyeon
Copy link
Author

Thank you so much @davecheney for the extremely quick response. I will return with this information as soon as possible 🙏

@Pungyeon
Copy link
Author

Here is a link with two different traces, I hope the names are sufficiently descriptive :) https://lmjcorti-my.sharepoint.com/:f:/g/personal/infrastructure_corti_ai/EtXO0HhmtIdBhrusGcjdkhQB4INvplcJyR50eR9NGP9iyg?e=KNPC68

I have run this in the official golang:1.15 docker container, with 2GB of memory allocated. Since the leak is rather minor I didn't wait until a OOM, but I hope that the logs are still useful. If not, let me know and I will rerun the tests again and produce new output.

Thank you again.

@davecheney
Copy link
Contributor

Thank you for this data, can I ask you to use go build, not go run as the trace for go run is conflated with the trace for you application.

@davecheney
Copy link
Contributor

Having said that, it looks like your application's heap is around 25 based on this line

gc 11 @600.379s 0%: 0.062+13+0.025 ms clock, 0.12+0/6.4/14+0.051 ms cpu, 13->13->12 MB, 25 MB goal, 2

Does this match with your measurements?

@aclements
Copy link
Member

@Pungyeon , it's not clear from your original post how you're observing that memory is not being reclaimed by the OS. Are you observing this from memory statistics reported by the OS, and if so, which ones (RSS or something else)? Or are you observing OOMs?

What happens if you set GODEBUG=madvdontneed=1 without using debug.FreeOSMemory?

What does runtime.ReadMemStats report when your application is having memory trouble? In particular, what are HeapSys, HeapInUse, and HeapReleased?

Go defaults to using MADV_FREE, which marks memory that can be reclaimed by the OS, but that memory doesn't get reclaimed until the OS is actually under pressure. Hence, it can look like your application has a large RSS, but that can shrink rapidly if the OS needs that memory. I ask about OOMs because the OS should definitely reclaim the memory before OOMing your process, so if your process is OOMing, that suggests a different issue.

@gopherbot
Copy link

Change https://golang.org/cl/267100 mentions this issue: runtime: default to MADV_DONTNEED on Linux

gopherbot pushed a commit that referenced this issue Nov 2, 2020
In Go 1.12, we changed the runtime to use MADV_FREE when available on
Linux (falling back to MADV_DONTNEED) in CL 135395 to address issue
 #23687. While MADV_FREE is somewhat faster than MADV_DONTNEED, it
doesn't affect many of the statistics that MADV_DONTNEED does until
the memory is actually reclaimed under OS memory pressure. This
generally leads to poor user experience, like confusing stats in top
and other monitoring tools; and bad integration with management
systems that respond to memory usage.

We've seen numerous issues about this user experience, including
 #41818, #39295, #37585, #33376, and #30904, many questions on Go
mailing lists, and requests for mechanisms to change this behavior at
run-time, such as #40870. There are also issues that may be a result
of this, but root-causing it can be difficult, such as #41444 and
 #39174. And there's some evidence it may even be incompatible with
Android's process management in #37569.

This CL changes the default to prefer MADV_DONTNEED over MADV_FREE, to
favor user-friendliness and minimal surprise over performance. I think
it's become clear that Linux's implementation of MADV_FREE ultimately
doesn't meet our needs. We've also made many improvements to the
scavenger since Go 1.12. In particular, it is now far more prompt and
it is self-paced, so it will simply trickle memory back to the system
a little more slowly with this change. This can still be overridden by
setting GODEBUG=madvdontneed=0.

Fixes #42330 (meta-issue).

Fixes #41818, #39295, #37585, #33376, #30904 (many of which were
already closed as "working as intended").

Change-Id: Ib6aa7f2dc8419b32516cc5a5fc402faf576c92e4
Reviewed-on: https://go-review.googlesource.com/c/go/+/267100
Trust: Austin Clements <austin@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
@Pungyeon
Copy link
Author

Pungyeon commented Nov 2, 2020

Hi everyone 👋 Really really sorry for the radio silence 😳 I am honestly quite embarassed about this.

Thank you for this data, can I ask you to use go build, not go run as the trace for go run is conflated with the trace for you application.

@davecheney I will see try to do this tomorrow and get back to you as soon as possible.

it's not clear from your original post how you're observing that memory is not being reclaimed by the OS. Are you observing this from memory statistics reported by the OS, and if so, which ones (RSS or something else)? Or are you observing OOMs?

@aclements we were experiencing OOMs on some of our customer deployments, possibly due to this issue. However, we were able to mitigate this, by using the combination of GODEBUG=madvdontneed=1 and debug.FreeOSMemory (running this in intervals). This isn't really ideal, hence why I opened this issue.

This is an internal video, which resulted in our final solution. Perhaps it's a misunderstanding of the memory management in Go... If so, please let me know what else to do 🙏 https://drive.google.com/file/d/1si2BltEeESERp3W7bXSL7tSKRzoB5k_8/view?usp=sharing

@aclements
Copy link
Member

@aclements we were experiencing OOMs on some of our customer deployments, possibly due to this issue. However, we were able to mitigate this, by using the combination of GODEBUG=madvdontneed=1 and debug.FreeOSMemory (running this in intervals). This isn't really ideal, hence why I opened this issue.

GODEBUG=madvdontneed=1 shouldn't have any effect on OOMs. The fact that debug.FreeOSMemory helped suggests to me your application was actually running pretty close to memory limits and this just helped nudge it into a safe amount. Since the Go garbage collector is designed to retain about twice as much memory as your live heap size, debug.FreeOSMemory is expected to free about half of your memory, but the moment you start allocating again after freeing memory, it will grow back into that 2x envelope. This isn't necessarily a bad thing to do if you know your process is going to be idle for a long time, or if you know it just released a lot of memory, but it's not meant to fix OOMs.

This is an internal video, which resulted in our final solution. Perhaps it's a misunderstanding of the memory management in Go... If so, please let me know what else to do pray https://drive.google.com/file/d/1si2BltEeESERp3W7bXSL7tSKRzoB5k_8/view?usp=sharing

Without knowing what Docker is reporting as "memory usage" in that monitoring tool, it's hard to say exactly what's going on here (I hate it when things are just labeled "memory usage"; there are so many different things that can mean). But the 2x drop from FreeOSMemory isn't a surprise, as I explained above. The fact that it rises with madvdontneed=0 but levels out with madvdontneed=1 suggests that this is indeed just a monitoring problem, and that "memory usage" includes memory that can be reclaimed at any time by the OS if there's pressure.

@Pungyeon
Copy link
Author

Pungyeon commented Nov 2, 2020

Without knowing what Docker is reporting as "memory usage" in that monitoring tool, it's hard to say exactly what's going on here (I hate it when things are just labeled "memory usage"; there are so many different things that can mean)

@aclements I absolutely agree with, the levels of ambiguity in a lot of the tools we have used for attempting to debug this has not been helpful 😅 Either way, if this is expected behaviour. Then let us close this issue. I will do some further testing and report back once I have something to report on.

Thank you so much for your assistance 🙏

@aclements
Copy link
Member

Glad we were able to help! FYI, we're currently planning to default to madvdontneed=1 in Go 1.16 because of these sorts of confusing monitoring issues (#42330).

@golang golang locked and limited conversation to collaborators Nov 2, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

4 participants