Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: the scavenger doesn't call MADV_NOHUGEPAGE like it used to #55328

Closed
mknyszek opened this issue Sep 21, 2022 · 7 comments
Closed

runtime: the scavenger doesn't call MADV_NOHUGEPAGE like it used to #55328

mknyszek opened this issue Sep 21, 2022 · 7 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Linux Performance
Milestone

Comments

@mknyszek
Copy link
Contributor

Because the Go runtime's background scavenger releases memory in 64 KiB chunks, the heuristic in mem_linux.go's implementation of sysUnusedOS never actually calls MADV_NOHUGEPAGE to break up huge pages that would otherwise keep a lot of memory around. The reason for this is that there's a heuristic preventing the call on every sysUnusedOS to avoid creating too many VMAs (Linux has a low limit).

The result of all this is that Linux may be keeping around huge pages transparently using up a lot more memory than intended, and reducing the effectiveness of the scavenger.

I think the fix for this is to mark all new heap memory as MADV_NOHUGEPAGE initially, and then only call it again in circumstances where we free memory that is likely to have had MADV_HUGEPAGE called on it (e.g. a large object that contains at least one aligned 2 MiB region will have this happen via sysUsed). However, this might not be enough, and needs some more investigation.

CC @golang/runtime

@mknyszek mknyszek added Performance OS-Linux NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Sep 21, 2022
@mknyszek mknyszek added this to the Go1.20 milestone Sep 21, 2022
@mknyszek mknyszek self-assigned this Sep 21, 2022
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Sep 21, 2022
@mknyszek
Copy link
Contributor Author

Thinking about this more, what my proposed solution loses is the benefit of backing dense parts of the heap with huge pages.

However, back when we worked on the scavenger, there was a key insight about having a GC'd heap: as we reach the end of a GC cycle, the whole heap is always densely packed. I wonder if we should say something like "back up the heap with MADV_HUGEPAGE up to the heap goal." Due to fragmentation this leaves a non-huge-page tail, but that's OK.

At first I thought the goroutine doing mark termination could spend a little extra time post-STW to go over the (mostly-contiguous) heap and call MADV_HUGEPAGE and MADV_NOHUGEPAGE a bunch of times, but maybe what really should be happening is just the MADV_HUGEPAGE part, and the scavenger then takes care of MADV_NOHUGEPAGE as it walks down the heap from the highest addresses. It means there's a delay, but there's a chance for the scavenger to take advantage of calling MADV_DONTNEED on hugepages which is considerably faster. This may not be worth it, however. It's not hard to try both approaches.

This is in effect a very simple "huge page aware" allocation policy without needing to do any special bin-packing. We're taking advantage of the fact that we can count on heap density, thanks to the GC.

@mknyszek
Copy link
Contributor Author

The idea in the last comment doesn't really work. Well, it kind of does. It definitely reduces the number of VMAs, however it causes a small performance regression. The issue is that the scavenger still breaks up huge pages while the heap is growing, which I partially mitigated by adding a heuristic to not turn on the scavenger while the heap is growing.

This wouldn't be so bad if the huge pages were collapsed again lazily, but because we set MADV_HUGEPAGE explicitly, and Linux's defrag setting (at least on my machine) is "defer+madvise", these huge pages are collapsed immediately. This idea also had some issues related to flapping at the boundary of the "this is backed by huge pages" region and also tended to increase memory use somewhat because, as it turns out, for small heaps, the scavenger does a decent job of returning fragmentation back to the OS.

So, new idea: track page alloc chunk occupancy. If it exceeds 85%, mark it as MADV_HUGEPAGE and make it invisible to the scavenger. If it later drops below 75%, make it visible to the scavenger. The scavenger then also marks chunks as MADV_NOHUGEPAGE when it first decides to scavenge there, and it was previously marked as MADV_HUGEPAGE. (Chunks are 4 MiB so typically 2x larger than huge pages, but it's not the worst proxy. It's a bit easier to prototype as well.)

This is much more direct: we're just trying to back the high-occupancy chunks with huge pages, exactly where we get the most benefit.

@gopherbot
Copy link

Change https://go.dev/cl/436395 mentions this issue: runtime: use and discard hugepages more explicitly

@mknyszek
Copy link
Contributor Author

I think the latest iteration of https://go.dev/cl/436395 implements a decent huge page policy. It leverages both the occupancy heuristic and general assumptions about heap density. 4 MiB chunks that are consistently densely packed (96%+ occupancy, because MADV_HUGEPAGE is so expensive) are marked MADV_HUGEPAGE immediately. If they cease being dense and stay that way through to the next GC cycle, we can reasonably assume it will stay that way, so the background scavenger becomes allowed to release that memory, and mark it MADV_NOHUGEPAGE eagerly.

Scavenging for the memory limit or debug.FreeOSMemory forces all memory to be available for scavenging, and an extra part of this policy I want to add is that scavenging memory in this way keeps it MADV_NOHUGEPAGE at least until the following GC cycle. This is to ensure that these two mechanisms are as effective as they can be while still trying to reduce churn (just in the other direction).

https://perf.golang.org/search?q=upload:20221110.8

@mknyszek
Copy link
Contributor Author

(The purely occupancy-based implementation had churn issues as well.)

@gopherbot
Copy link

Change https://go.dev/cl/460316 mentions this issue: runtime: disable huge pages for GC metadata for small heaps

gopherbot pushed a commit that referenced this issue Apr 19, 2023
For #55328.

Change-Id: I8792161f09906c08d506cc0ace9d07e76ec6baa6
Reviewed-on: https://go-review.googlesource.com/c/go/+/460316
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
@gopherbot
Copy link

Change https://go.dev/cl/526615 mentions this issue: _content/doc: discuss transparent huge pages in the GC guide

gopherbot pushed a commit to golang/website that referenced this issue Sep 11, 2023
For golang/go#8832.
For golang/go#55328.
For golang/go#61718.

Change-Id: I1ee51424dc2591a84f09ca8687c113f0af3550d1
Reviewed-on: https://go-review.googlesource.com/c/website/+/526615
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
willpoint pushed a commit to orijtech/website that referenced this issue Oct 17, 2023
For golang/go#8832.
For golang/go#55328.
For golang/go#61718.

Change-Id: I1ee51424dc2591a84f09ca8687c113f0af3550d1
Reviewed-on: https://go-review.googlesource.com/c/website/+/526615
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Linux Performance
Projects
None yet
Development

No branches or pull requests

2 participants