Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: heap size reached GOMEMLIMIT but no GC activity triggered #56764

Closed
hitzhangjie opened this issue Nov 16, 2022 · 8 comments
Closed

runtime: heap size reached GOMEMLIMIT but no GC activity triggered #56764

hitzhangjie opened this issue Nov 16, 2022 · 8 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Milestone

Comments

@hitzhangjie
Copy link
Contributor

hitzhangjie commented Nov 16, 2022

What version of Go are you using (go version)?

$ go version
go version go1.19 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/root/go/pkg/mod"
GOOS="linux"
GOPATH="/root/go"
GOROOT="/usr/local/go"
GOSUMDB="off"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.19"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/root/dev/go.mod"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2109843150=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Before we use go memory ballast to avoid frequent GC when using little heap. After upgrade to go1.19, I want to test whether we should use GOMEMLIMIT instead when deploying by per host per service.

I use GOGC=off+GOMEMLIMIT=1GB to test, our host is 4 core 8 GB. And there's no other services may compete for the memory.
I do some http benchmarking against the service:

  • when use GOGC=100+1GB ballast, I can see GC activities normally, per GC per second, the heap is steady around 2GB.
  • when use GOGC=off+1GB GOMEMLIMIT, I didn't see any GC activity even though the heap size reached to 6.3GB and then it is OOM killed.

This testing process last for nearly 2 minutes. Even though there's no forced GC (for GOGC=off), I think the GC should be triggered by GOMEMLIMIT=1GB, but not triggered.

What did you expect to see?

I want to see at least there's some GC activities when heap reached to the soft memory limit.

What did you see instead?

I didn't see any GC activity. And The heap grow beyond the soft memory limit (1GB) to 6.3GB, then it's OOM killed.

Actually, I know there's some difference between ballast and soft memory limit. I am just curious about why no GC activity triggered when heap grow beyond the limit.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Nov 16, 2022
@hitzhangjie hitzhangjie changed the title runtime: heap size reached GOMEMLIMIT but no GC activity triggered, then OOM occurs runtime: heap size reached GOMEMLIMIT but no GC activity triggered Nov 16, 2022
@joedian joedian added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 16, 2022
@mknyszek
Copy link
Contributor

To be clear, are you passing GOMEMLIMIT=1GB verbatim, or are you passing GOMEMLIMIT=1GiB? It shouldn't be silently failing (it should crash), but the runtime only understands GOMEMLIMIT=1GiB with the i. (Maybe we should just make GB work for power-of-two.)

However, if you're seeing that it uses the memory anyway, that's not good, though I can't trivially reproduce this locally. Do you have a reproducer by any chance?

@mknyszek mknyszek added this to the Backlog milestone Nov 16, 2022
@mknyszek mknyszek self-assigned this Nov 16, 2022
@mknyszek
Copy link
Contributor

FWIW, I would expect this behavior if you were using a version of Go that didn't support GOMEMLIMIT.

@mknyszek
Copy link
Contributor

Also, how are you confirming there is no GC activity? What's the output of running your program with GODEBUG=gctrace=1? Thanks.

@mknyszek
Copy link
Contributor

I've been looking into this more since it's potentially pretty serious, but I haven't found any leads. Plus, we're regularly testing this behavior in the runtime and this memory limit functionality is used within Google, and we haven't seen any serious out of memory issues in production as a result of it (yet).

If you have any more information or a way to reproduce, please let me know! Putting this into WaitingForInfo for now.

@mknyszek mknyszek added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Nov 16, 2022
@hitzhangjie
Copy link
Contributor Author

hitzhangjie commented Nov 17, 2022

To be clear, are you passing GOMEMLIMIT=1GB verbatim, or are you passing GOMEMLIMIT=1GiB? It shouldn't be silently failing (it should crash), but the runtime only understands GOMEMLIMIT=1GiB with the i. (Maybe we should just make GB work for power-of-two.)

However, if you're seeing that it uses the memory anyway, that's not good, though I can't trivially reproduce this locally. Do you have a reproducer by any chance?

I used GOMEMLIMIT=1GiB instead of 1GB. Acutally, I use debug.SetMemoryLimit(1<<30).

FWIW, I would expect this behavior if you were using a version of Go that didn't support GOMEMLIMIT.

I used go1.16.5 to compile and benchmark it with the same pressure.

  • If only using GOGC=100, it works well. Because the GC is very frequent, so I used 1GB memory ballast so we can reduce the latency.
  • If using GOGC=100+1GB ballast, it works well, too. (If using go1.19, this solution works well, too).

ps: Then why I want to replace the ballast? The initialization of ballast is tricky, the allocation memory space maybe zeroed if the pages is reused, then the ballast will take up a huge 1GB RSS.

Also, how are you confirming there is no GC activity? What's the output of running your program with GODEBUG=gctrace=1? Thanks.

Yes, I used GODEBUG=gctrace=1, there's no output of GC activity. I also read the memstats per second and print the NumGC, it's zero. But HeapAlloc isn't zero, it's a big value about 4G+ before OOM detected and killed.

I've been looking into this more since it's potentially pretty serious, but I haven't found any leads. Plus, we're regularly testing this behavior in the runtime and this memory limit functionality is used within Google, and we haven't seen any serious out of memory issues in production as a result of it (yet).

If you have any more information or a way to reproduce, please let me know! Putting this into WaitingForInfo for now.

OK, I will try to provide an reproducible example later.

ps: Now I have a guess, the GC is triggered by the goal calculated by GOMEMLIMIT, but it is not complete. During the mark phase, the new allocated object are marked reachable, so the RSS increases. And this occasion is under a http benchmark, the pressure is high, Rss increases too high, finally it's killed. It's a guess, I will test it later :)

@mknyszek
Copy link
Contributor

ps: Now I have a guess, the GC is triggered by the goal calculated by GOMEMLIMIT, but it is not complete. During the mark phase, the new allocated object are marked reachable, so the RSS increases. And this occasion is under a http benchmark, the pressure is high, Rss increases too high, finally it's killed. It's a guess, I will test it later :)

Sorry, I'm not sure I follow. If your application allocates around 6.3 GiB in 2 minutes (as per your original post), then that's an allocation rate of about 53 MiB/s. If you have a 1 GiB ballast and your total live heap stays around 2 GiB, that suggests to me that your live heap is small, on the order of MiB. The GC should have no problem keeping up in this scenario I would expect the mark phase to be really short. (Even then, newly allocated memory marked live during a mark phase will become eligible for reclamation next cycle provided it's not referenced by the next mark phase.)

If the application is actually getting to a mark phase, the GC is programmed to become more aggressive as the goal is neared and begins to be exceeded. It'll force the allocating goroutines to assist until it can finish.

I think a reproducer at this point would be the most useful thing.

@hitzhangjie
Copy link
Contributor Author

@mknyszek Thanks very much.


My bad, when I merged the code, there're some conflicts. I dropped one line of code debug.SetMemoryLimit(...). So ... :(

I read, modified the go mgc.go and add some debugging messages, and test again. I found if this problem was caused by a bug, it will be a very very very apparent bug. Then I check my code. It's a little awkward :)

I'll close this issue. @mknyszek Thank you.

@mknyszek
Copy link
Contributor

Oh, haha. It happens. :) Well, thanks for checking and for trying to reproduce!

@golang golang locked and limited conversation to collaborators Nov 17, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

4 participants