Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abnormal memory usage on MIPS platform with GOMIPS=softfloat #39174

Closed
Gh0u1L5 opened this issue May 20, 2020 · 11 comments
Closed

Abnormal memory usage on MIPS platform with GOMIPS=softfloat #39174

Gh0u1L5 opened this issue May 20, 2020 · 11 comments
Labels
FrozenDueToAge WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Milestone

Comments

@Gh0u1L5
Copy link

Gh0u1L5 commented May 20, 2020

What version of Go are you using (go version)?

$ go version
go version go1.14.3 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="mipsle"
GOBIN=""
GOCACHE="/home/build/.cache/go-build"
GOENV="/home/build/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/build/Go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/lib/go-1.14"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go-1.14/pkg/tool/linux_amd64"
GCCGO="/usr/bin/gccgo"
GOMIPS="softfloat"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="0"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -mabi=32 -march=mips32 -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build072440335=/tmp/go-build -gno-record-gcc-switches"

What did you do?

  1. I compiled the following hello world program:
package main

import "fmt"

func main() {
    fmt.Println("Hello")
    for true {}
}
  1. I uploaded the compiled binary to my OpenWRT router, and executed the binary.

What did you expect to see?

The memory usage should be a reasonable number, e.g. 2~3MB.

What did you see instead?

The top program reported a crazy number (~650MB), and the program got killed by the OOM-killer later.

  PID  PPID USER     STAT   VSZ %VSZ %CPU COMMAND
 6339  1120 root     R     647m 541%  23% ./Hello

And More

So I've did some tests and got the following interesting observations:

  1. It has nothing to do with the kernel version. I've tried downgrade the kernel from 5.4.41 to 4.14.180, it makes no difference.

  2. I've tried GOGC=<some number>, it makes no difference.

  3. I've tried GODEBUG=madvdontneed=1 and GODEBUG=asyncpreemptoff=1 because I noticed that these features had caused some memory issues before. However, I got no luck on these two either.

  4. I haven't try GOARCH=mips or GOMIPS=hardfloat yet, because that OpenWRT router is the only MIPS device I have in my hand, and it doesn't support these options.

  5. I eventually found a temporary workaround: ulimit -v 32768, i.e. limit the memory usage to at most 32MB. After applying this limitation, I got a very reasonable memory usage:

  PID  PPID USER     STAT   VSZ %VSZ %CPU COMMAND
 6415  1120 root     R     6676   5%  25% ./Hello
  1. BTW, my router has only 120MB memory, thus this ~600MB memory usage makes absolutely no sense for me.
@mknyszek
Copy link
Contributor

This is expected (VSZ reporting ~600 MiB), assuming that first line was generated on linux/amd64. Your issue title suggests that this is a problem on mips, but you talk about linux/amd64 earlier. Could you clarify which numbers were generated on which platforms, and where the OOM happened?

As of Go 1.14 the runtime initializes some structures by making large virtual mappings. Note that it does not actually ever map that memory unless it's needed (and how much is needed is proportional to your heap size), so Go 1.14 should not use much more physical memory than Go 1.13. If your process was OOM-killed it may be due to how your overcommit settings are configured (though in my experiments, I found that Linux really doesn't charge you for simply reserving address space, regardless of the overcommit setting), or it could be something else (a low ulimit for example). Are you positive your program was OOM-killed due to virtual memory usage?

I do not, in general, recommend using ulimit -v because these days virtual memory on most platforms is ~free, so virtual memory use is generally not a good proxy for limiting how much physical memory a process can use. 32-bit platforms are an exception, but we don't make such large virtual mappings on 32-bit. If you're seeing large mappings on 32-bit, that's a problem.

@mknyszek mknyszek added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label May 20, 2020
@mknyszek mknyszek added this to the Go1.15 milestone May 20, 2020
@Gh0u1L5
Copy link
Author

Gh0u1L5 commented May 20, 2020

So, what I'm doing is cross compiling a binary for a MIPS device (a OpenWRT router). If you check the go env output, you will notice that the GOHOSTARCH is amd64, and GOARCH is mipsle. The OOM happens when I execute the compiled binary on MIPS platform.

The explanation about virtual mappings makes sense for me. I double checked the log, and noticed that the hello world program actually never get killed by the OOM-killer. The one get killed is the program I'm actually working on, a traffic forwarding program written in Go.

So, I guess what is actually happening is, the traffic forwarding program can see a large trunk of virtual memory (~650MB), without knowing that the physical memory is merely 120MB. So it keeps charging new memory without triggering GC. Eventually, it hits the hard limit, OOM-killer jumps up, crash. Does this guess make sense to you?

@mknyszek
Copy link
Contributor

So, what I'm doing is cross compiling a binary for a MIPS device (a OpenWRT router). If you check the go env output, you will notice that the GOHOSTARCH is amd64, and GOARCH is mipsle. The OOM happens when I execute the compiled binary on MIPS platform.

I see... then it is somewhat surprising to me that there's such a large mapping on mipsle, which is 32-bit. There may be a real issue here since there isn't much address space on 32-bit.

The explanation about virtual mappings makes sense for me. I double checked the log, and noticed that the hello world program actually never get killed by the OOM-killer. The one get killed is the program I'm actually working on, a traffic forwarding program written in Go.

Got it, thank you for the clarification.

So, I guess what is actually happening is, the traffic forwarding program can see a large trunk of virtual memory (~650MB), without knowing that the physical memory is merely 120MB. So it keeps charging new memory without triggering GC. Eventually, it hits the hard limit, OOM-killer jumps up, crash. Does this guess make sense to you?

I'm not sure I follow. Does the traffic forwarding program explicitly reference the virtual memory usage for some kind of throttling mechanism? And I'm not sure what you mean by triggering a GC, either; if you have GOGC set to e.g. 100, then GCs will trigger when the heap reaches double the live heap at the end of the last GC (which is all tracked by updating metadata when allocations are made; it doesn't read any OS-reported statistics). It's not triggered based on a memory maximum or anything.

@Gh0u1L5
Copy link
Author

Gh0u1L5 commented May 21, 2020

So, I guess what is actually happening is, the traffic forwarding program can see a large trunk of virtual memory (~650MB), without knowing that the physical memory is merely 120MB. So it keeps charging new memory without triggering GC. Eventually, it hits the hard limit, OOM-killer jumps up, crash. Does this guess make sense to you?

I'm not sure I follow. Does the traffic forwarding program explicitly reference the virtual memory usage for some kind of throttling mechanism? And I'm not sure what you mean by triggering a GC, either; if you have GOGC set to e.g. 100, then GCs will trigger when the heap reaches double the live heap at the end of the last GC (which is all tracked by updating metadata when allocations are made; it doesn't read any OS-reported statistics). It's not triggered based on a memory maximum or anything.

Ummm? Interesting. I didn't know that the GC in Go is triggered by the increasing rate. Then how does ulimit help relieve my issue?

@Gh0u1L5
Copy link
Author

Gh0u1L5 commented May 23, 2020

Update: I thought the ulimit settings help relieve the issue. But it turns out that what really helped are some iptable rules set by another team. These iptable rules redirects a lot of connections to another router, which means the traffic forwarding program now experiences far less pressure. So I guess the OOM issue is mainly caused by poor memory optimization of the program itself, I'll spend some time working on that.

In the mean while, I'll keep the issue open until you figure out whether 650MB virtual memory is normal for mipsle or not. Good luck and have a nice weekend!

@ianlancetaylor
Copy link
Contributor

CC @cherrymui

@ianlancetaylor ianlancetaylor modified the milestones: Go1.15, Backlog Jun 26, 2020
@cherrymui
Copy link
Member

As @mknyszek replied, the virtual memory mapping size is probably expected. I see similar numbers on 32-bit 386. As that memory is not actually paged in, not sure if it causes any actual problem. I think softfloat is probably not related here (unless there is some bug in softfloat implementation that causes memory size calculation goes wrong).

That said, it is still a general problem for how to run Go programs in a memory-limited environment (related: #29696).

@gopherbot
Copy link

Change https://golang.org/cl/267100 mentions this issue: runtime: default to MADV_DONTNEED on Linux

gopherbot pushed a commit that referenced this issue Nov 2, 2020
In Go 1.12, we changed the runtime to use MADV_FREE when available on
Linux (falling back to MADV_DONTNEED) in CL 135395 to address issue
 #23687. While MADV_FREE is somewhat faster than MADV_DONTNEED, it
doesn't affect many of the statistics that MADV_DONTNEED does until
the memory is actually reclaimed under OS memory pressure. This
generally leads to poor user experience, like confusing stats in top
and other monitoring tools; and bad integration with management
systems that respond to memory usage.

We've seen numerous issues about this user experience, including
 #41818, #39295, #37585, #33376, and #30904, many questions on Go
mailing lists, and requests for mechanisms to change this behavior at
run-time, such as #40870. There are also issues that may be a result
of this, but root-causing it can be difficult, such as #41444 and
 #39174. And there's some evidence it may even be incompatible with
Android's process management in #37569.

This CL changes the default to prefer MADV_DONTNEED over MADV_FREE, to
favor user-friendliness and minimal surprise over performance. I think
it's become clear that Linux's implementation of MADV_FREE ultimately
doesn't meet our needs. We've also made many improvements to the
scavenger since Go 1.12. In particular, it is now far more prompt and
it is self-paced, so it will simply trickle memory back to the system
a little more slowly with this change. This can still be overridden by
setting GODEBUG=madvdontneed=0.

Fixes #42330 (meta-issue).

Fixes #41818, #39295, #37585, #33376, #30904 (many of which were
already closed as "working as intended").

Change-Id: Ib6aa7f2dc8419b32516cc5a5fc402faf576c92e4
Reviewed-on: https://go-review.googlesource.com/c/go/+/267100
Trust: Austin Clements <austin@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
@changkun
Copy link
Member

changkun commented Jan 5, 2021

Because 05e6d28 is landed on the master, this issue can be closed.

@Gh0u1L5
Copy link
Author

Gh0u1L5 commented Jan 7, 2021

@changkun 👍 Thanks a lot, which you a nice day.

@ghost
Copy link

ghost commented Dec 3, 2021

Because 05e6d28 is landed on the master, this issue can be closed.

That commit is not relevant. OP already said:

3. 've tried GODEBUG=madvdontneed=1 an

If he tried setting GODEBUG=madvdontneed=1 and it didn't fix the issue, I don't see how GODEBUG=madvdontneed=1 being the default (the commit you reference) is going to fix it.

For what it's worth, I have this exact same issue and it is driving me nuts. Three identical MIPS machines running the same golang program; one of them OOMs after two hours, one of them OOMs about once a day, and the third one seems unaffected.

Closing GC issues as "WORKSFORME" is not cool. GC is inherently a nondeterministic thing. You should expect buggy GCs to work properly for lots of people.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

6 participants