runtime: make the page allocator scale #35112

mknyszek · 2019-10-23T14:07:36Z

Over the course of the last year, we've seen several cases where making relatively minor changes to the allocator slow path, which allocates pages from the heap, caused serious performance issues (#32828, #31678). The problem stemmed largely from contention on the heap lock: a central m-based spinlock (so we can't schedule another g while it's waiting!) which guards nearly all operations on the page heap. Since Go 1.11, (and likely earlier) we've observed barging behavior on this lock in applications which allocate larger objects (~1K) frequently, which indicates a collapsing lock. Furthermore, these applications tend to stay the same or worsen in performance as the number of available cores on the machine increases. This represents a fundamental scalability bottleneck in the runtime.

Currently the page allocator is based on a treap (and formerly had a small linked-list based cache for smaller free spans), but I propose we rethink it and rewrite to something that:

Is more cache-friendly, branch-predictor friendly, etc. on the whole.
Has a lockless fast path.

The former just makes the page allocator faster and less likely to fall into bad paths in the microarchitecture, while the latter directly reduces contention on the lock.

While increased performance across the board is what we want, what we're most interested in solving here is the scalability bottleneck: when we increase the number of cores available to a Go application, we want to see a performance improvement.

Edit: Design doc

I've already built both an out-of-tree and in-tree prototype of a solution which is based on a bitmap representing the whole heap and a radix tree over that bitmap. The key idea with bitmaps here being that we may "land grab" several pages at once and cache them in a P. Two RPC benchmarks, a Google internal one, and one based on Tile38, show up to a 30% increase in throughput and up to a 40% drop in tail latencies (90th, 99th percentile) on a 48-core machine, with the benefit increasing with the number of cores available to the benchmark.

gopherbot · 2019-10-23T14:31:00Z

Change https://golang.org/cl/200439 mentions this issue: runtime: place lower limit on trigger ratio

gopherbot · 2019-10-23T14:31:02Z

Change https://golang.org/cl/195698 mentions this issue: runtime: count scavenged bits for new allocation for new page allocator

gopherbot · 2019-10-23T14:31:03Z

Change https://golang.org/cl/190621 mentions this issue: runtime: add packed bitmap summaries

gopherbot · 2019-10-23T14:31:05Z

Change https://golang.org/cl/196640 mentions this issue: runtime: make more page sweeper operations atomic

gopherbot · 2019-10-23T14:31:06Z

Change https://golang.org/cl/195697 mentions this issue: runtime: add scavenging code for new page allocator

gopherbot · 2019-10-23T14:31:07Z

Change https://golang.org/cl/196643 mentions this issue: runtime: add page cache and tests

gopherbot · 2019-10-23T14:31:08Z

Change https://golang.org/cl/190622 mentions this issue: runtime: add new page allocator core

gopherbot · 2019-10-23T14:31:09Z

Change https://golang.org/cl/190619 mentions this issue: runtime: add new page allocator constants and description

gopherbot · 2019-10-23T14:31:09Z

Change https://golang.org/cl/196642 mentions this issue: runtime: add per-p mspan cache

gopherbot · 2019-10-23T14:31:10Z

Change https://golang.org/cl/201763 mentions this issue: runtime: make the scavenger self-paced

gopherbot · 2019-10-23T14:31:12Z

Change https://golang.org/cl/190620 mentions this issue: runtime: add mallocbits and tests

gopherbot · 2019-10-23T14:31:13Z

Change https://golang.org/cl/196639 mentions this issue: runtime: remove unnecessary large parameter to mheap_.alloc

gopherbot · 2019-10-23T14:31:14Z

Change https://golang.org/cl/195701 mentions this issue: runtime: add per-p page allocation cache

gopherbot · 2019-10-23T14:31:15Z

Change https://golang.org/cl/201764 mentions this issue: runtime: integrate new page allocator into runtime

gopherbot · 2019-10-23T14:31:16Z

Change https://golang.org/cl/195700 mentions this issue: runtime: remove old page allocator

gopherbot · 2019-10-23T14:31:17Z

Change https://golang.org/cl/195699 mentions this issue: runtime: add option to scavenge with lock held throughout

gopherbot · 2019-10-23T14:31:18Z

Change https://golang.org/cl/201765 mentions this issue: runtime: switch to new page allocator

gopherbot · 2019-10-23T14:31:19Z

Change https://golang.org/cl/196638 mentions this issue: runtime: ensure heap memstats are updated atomically

gopherbot · 2019-10-23T14:31:20Z

Change https://golang.org/cl/196641 mentions this issue: runtime: refactor mheap_.alloc* into allocSpan

mknyszek · 2019-10-23T18:07:50Z

Gopherbot isn't updating the issue with this, but the design doc is at @ https://go-review.googlesource.com/c/proposal/+/202857. I'll update the first post in the issue once it lands.

CC: @aclements @randall77 @cherrymui

gopherbot · 2019-10-24T22:37:59Z

Change https://golang.org/cl/203318 mentions this issue: runtime: fix (*gcSweepBuf).block guarantees

gopherbot · 2019-10-28T20:05:55Z

Change https://golang.org/cl/203858 mentions this issue: runtime: compute whether a span needs zeroing in the new page allocator

gopherbot · 2019-10-28T20:05:57Z

Change https://golang.org/cl/203859 mentions this issue: runtime: make allocNeedsZero lock-free

un000 · 2019-10-29T09:00:33Z

#31222
May be related
Could you test the new allocator with a reproducer from the ticket above?

mknyszek · 2019-10-29T13:21:54Z

@un000 It's possible but unlikely this will help.

Let's assume that the problem in that issue is fully that a long allocation blocks STW for 43 ms. Firstly, 100 MB allocations don't really get any faster with this proposal. In that issue, asking for 100 MB is definitely going to go to the OS and map pages in. No matter what we're bound by how quickly we can get 100 MB worth of committed address space from the OS.

mknyszek · 2019-10-29T13:54:28Z

@un000 Well, it helps maxWait at least and brings it down to 1.0s vs. 1.3s on my machine, so perhaps I spoke too soon.

But I'm hitting stalls trying to copy the traces to verify whether there's an improvement in the STW delay. I'll try again later and let you know what I found.

This change defines a maximum supported physical and huge page size in the runtime based on the new page allocator's implementation, and uses them where appropriate. Furthemore, if the system exceeds the maximum supported huge page size, we simply ignore it silently. It also fixes a huge-page-related test which is only triggered by a condition which is definitely wrong. Finally, it adds a few TODOs related to code clean-up and supporting larger huge page sizes. Updates #35112. Fixes #35431. Change-Id: Ie4348afb6bf047cce2c1433576d1514720d8230f Reviewed-on: https://go-review.googlesource.com/c/go/+/205937 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>

mheap_.alloc currently accepts both a spanClass and a "large" parameter indicating whether the allocation is large. These are redundant, since spanClass.sizeclass() == 0 is an equivalent way to determine this and is already used in mheap_.alloc. There are no places in the runtime where the size class could be non-zero and large == true. Updates #35112. Change-Id: Ie66facf8f0faca6f4cd3d20a8ac4bc259e11823d Reviewed-on: https://go-review.googlesource.com/c/go/+/196639 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>

This change makes it so that allocation and free related page sweeper metadata operations (e.g. pageInUse and pagesInUse) are atomic rather than protected by the heap lock. This will help in reducing the length of the critical path with the heap lock held in future changes. Updates #35112. Change-Id: Ie82bff024204dd17c4c671af63350a7a41add354 Reviewed-on: https://go-review.googlesource.com/c/go/+/196640 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>

Currently gcSweepBuf guarantees that push operations may be performed concurrently with each other and that block operations may be performed concurrently with push operations as well. Unfortunately, this isn't quite true. The existing code allows push operations to happen concurrently with each other, but block operations may return blocks with nil entries. The way this can happen is if two concurrent pushers grab a slot to push to, and the first one (the one with the earlier slot in the buffer) doesn't quite write a span value when the block is called. The existing code in block only checks if the very last value in the block is nil, when really an arbitrary number of the last few values in the block may or may not be nil. Today, this case can't actually happen because when push operations happen concurrently during a GC (which is the only time block is called), they only ever happen during an allocation with the heap lock held, effectively serializing them. A block operation may happen concurrently with one of these pushes, but its callers will never see a nil mspan. Outside of a GC, this isn't a problem because although push operations from allocations can run concurrently with push operations from sweeping, block operations will never run. In essence, the real concurrency guarantees provided by gcSweepBuf are that block operations may happen concurrently with push operations, but that push operations may not be concurrent with each other if there are any block operations. To fix this, and to prepare for push operations happening without the heap lock held in a future CL, we update the documentation for block to correctly state that there may be nil entries in the returned slice. While we're here, make the mspan writes into the buffer atomic to avoid a block user racing on a nil check, and document that the user should load mspan values from the returned slice atomically. Finally, we make all callers of block adhere to the new rules. We choose to allow nil values rather than filter them out because the only caller of block is markrootSpans, and if it catches a nil entry, then there wasn't anything to mark in there anyway since the span is just being created. Updates #35112. Change-Id: I6450aab15f51690d7a000ba5b3d529cf2ca5da1e Reviewed-on: https://go-review.googlesource.com/c/go/+/203318 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>

This change combines the functionality of allocSpanLocked, allocManual, and alloc_m into a new method called allocSpan. While these methods' abstraction boundaries are OK when the heap lock is held throughout, they start to break down when we want finer-grained locking in the page allocator. allocSpan does just that, and only locks the heap when it absolutely has to. Piggy-backing off of work in previous CLs to make more of span initialization lockless, this change makes span initialization entirely lockless as part of the reorganization. Ultimately this change will enable us to add a lockless fast path to allocSpan. Updates #35112. Change-Id: I99875939d75fb4e958a67ac99e4a7cda44f06864 Reviewed-on: https://go-review.googlesource.com/c/go/+/196641 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>

This change adds a per-p mspan object cache similar to the sudog cache. Unfortunately this cache can't quite operate like the sudog cache, since it is used in contexts where write barriers are disallowed (i.e. allocation codepaths), so rather than managing an array and a slice, it's just an array and a length. A little bit more unsafe, but avoids any write barriers. The purpose of this change is to reduce the number of operations which require the heap lock in allocation, paving the way for a lockless fast path. Updates #35112. Change-Id: I32cfdcd8528fb7be985640e4f3a13cb98ffb7865 Reviewed-on: https://go-review.googlesource.com/c/go/+/196642 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>

This change adds a page cache structure which owns a chunk of free pages at a given base address. It also adds code to allocate to this cache from the page allocator. Finally, it adds tests for both. Notably this change does not yet integrate the code into the runtime, just into runtime tests. Updates #35112. Change-Id: Ibe121498d5c3be40390fab58a3816295601670df Reviewed-on: https://go-review.googlesource.com/c/go/+/196643 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>

This change adds a per-p free page cache which the page allocator may allocate out of without a lock. The change also introduces a completely lockless page allocator fast path. Although the cache contains at most 64 pages (and usually less), the vast majority (85%+) of page allocations are exactly 1 page in size. Updates #35112. Change-Id: I170bf0a9375873e7e3230845eb1df7e5cf741b78 Reviewed-on: https://go-review.googlesource.com/c/go/+/195701 Run-TryBot: Michael Knyszek <mknyszek@google.com> Reviewed-by: Austin Clements <austin@google.com>

gopherbot · 2019-11-08T21:48:21Z

Change https://golang.org/cl/206199 mentions this issue: runtime: copy some functions from math/bits to runtime/internal/sys

CL 201765 activated calls from the runtime to functions in math/bits. When coverage and race detection were simultaneously enabled, this caused a crash when the covered+race-checked code in math/bits was called from the runtime before there was even a P. PS Win for gdlv in helping sort this out. TODO - next CL intrinsifies the new functions in runtime/internal/sys TODO/Would-be-nice - Ctz64 and TrailingZeros64 are the same function; 386.s is intrinsified; clean all that up. Fixes #35461. Updates #35112. Change-Id: I750a54dba493130ad3e68a06530ede7687d41e1d Reviewed-on: https://go-review.googlesource.com/c/go/+/206199 Reviewed-by: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>

gopherbot · 2019-11-09T00:24:47Z

Change https://golang.org/cl/206277 mentions this issue: runtime: fix min/max logic in findScavengeCandidate

gopherbot · 2019-11-09T05:32:05Z

Change https://golang.org/cl/206200 mentions this issue: cmd/compile: intrinsify functions added to runtime/internal/sys

This restores intrinsic status to functions copied from math/bits into runtime/internal/sys, as an aid to runtime performance. Updates #35112. Change-Id: I41a7d87cf00f1e64d82aa95c5b1000bc128de820 Reviewed-on: https://go-review.googlesource.com/c/go/+/206200 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>

This change makes the test addresses start at 1 GiB instead of 2 GiB to support mips and mipsle, which only have 31-bit address spaces. It also changes some tests to use smaller offsets for the chunk index to avoid jumping too far ahead in the address space to support 31-bit address spaces. The tests don't require such large jumps for what they're testing anyway. Updates #35112. Fixes #35440. Change-Id: Ic68ff2b0a1f10ef37ac00d4bb5b910ddcdc76f2e Reviewed-on: https://go-review.googlesource.com/c/go/+/205938 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>

Before this CL, if max > min and max was unaligned to min, then the function could return an unaligned (unaligned to min) region to scavenge. On most platforms, this leads to some kind of crash. Fix this by explicitly aligning max to the next multiple of min. Fixes #35445. Updates #35112. Change-Id: I0af42d4a307b48a97e47ed152c619d77b0298291 Reviewed-on: https://go-review.googlesource.com/c/go/+/206277 Reviewed-by: Ian Lance Taylor <iant@golang.org>

gopherbot · 2019-11-13T17:00:33Z

Change https://golang.org/cl/206978 mentions this issue: runtime: check summary before scavenging in fast path

In scavengeOne's fast path, we currently don't check the summary for the chunk that scavAddr points to, which means that we might accidentally scavenge unused address space if the previous scavenge moves the scavAddr into that space. The result of this today is a crash. This change makes it so that scavengeOne's fast path only happens after the check, following the comment in mpagealloc.go. It also adds a test for this case. Fixes #35465. Updates #35112. Change-Id: I861d44ee75e42a0e1f5aaec243bc449228273903 Reviewed-on: https://go-review.googlesource.com/c/go/+/206978 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>

ianlancetaylor · 2019-12-05T19:28:48Z

@mknyszek Anything else to do for this issue?

mknyszek · 2019-12-05T20:35:07Z

Nope! Closing.

gopherbot · 2020-02-11T21:03:58Z

Change https://golang.org/cl/219121 mentions this issue: design/35112-scaling-the-page-allocator.md: update proposal

mknyszek added this to the Go1.14 milestone Oct 23, 2019

mknyszek self-assigned this Oct 23, 2019

ALTree added the Performance label Oct 23, 2019

tcolgate mentioned this issue Oct 24, 2019

runtime: concurrent object placement on heap causes Linux kernel contention #33092

Closed

mknyszek mentioned this issue Nov 22, 2019

runtime: scavenger not as effective as in previous releases #35788

Closed

mknyszek closed this as completed Dec 5, 2019

mknyszek mentioned this issue Feb 4, 2020

runtime: performance degradation in go 1.12 #36521

Closed

This was referenced Feb 26, 2020

runtime: improve mcentral scalability #37487

Closed

runtime: high memory usage on Go 1.14 #37525

Closed

mknyszek mentioned this issue Mar 18, 2020

runtime: GC pacing exhibits strange behavior with a low GOGC #37927

Closed

BorisKozo mentioned this issue May 3, 2020

runtime: increase in memory usage and different GC behavior upgrading from 1.13.10 to 1.14.2 #38827

Closed

golang locked and limited conversation to collaborators Feb 10, 2021

gopherbot added the FrozenDueToAge label Feb 10, 2021

rsc unassigned mknyszek Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: make the page allocator scale #35112

runtime: make the page allocator scale #35112

mknyszek commented Oct 23, 2019 •

edited

Loading

gopherbot commented Oct 23, 2019

gopherbot commented Oct 23, 2019

gopherbot commented Oct 23, 2019

gopherbot commented Oct 23, 2019

gopherbot commented Oct 23, 2019

gopherbot commented Oct 23, 2019

gopherbot commented Oct 23, 2019

gopherbot commented Oct 23, 2019

gopherbot commented Oct 23, 2019

gopherbot commented Oct 23, 2019

gopherbot commented Oct 23, 2019

gopherbot commented Oct 23, 2019

gopherbot commented Oct 23, 2019

gopherbot commented Oct 23, 2019

gopherbot commented Oct 23, 2019

gopherbot commented Oct 23, 2019

gopherbot commented Oct 23, 2019

gopherbot commented Oct 23, 2019

gopherbot commented Oct 23, 2019

mknyszek commented Oct 23, 2019

gopherbot commented Oct 24, 2019

gopherbot commented Oct 28, 2019

gopherbot commented Oct 28, 2019

un000 commented Oct 29, 2019 •

edited

Loading

mknyszek commented Oct 29, 2019

mknyszek commented Oct 29, 2019 •

edited

Loading

gopherbot commented Nov 8, 2019

gopherbot commented Nov 9, 2019

gopherbot commented Nov 9, 2019

gopherbot commented Nov 13, 2019

ianlancetaylor commented Dec 5, 2019

mknyszek commented Dec 5, 2019

gopherbot commented Feb 11, 2020

runtime: make the page allocator scale #35112

runtime: make the page allocator scale #35112

Comments

mknyszek commented Oct 23, 2019 • edited Loading

gopherbot commented Oct 23, 2019

gopherbot commented Oct 23, 2019

gopherbot commented Oct 23, 2019

gopherbot commented Oct 23, 2019

gopherbot commented Oct 23, 2019

gopherbot commented Oct 23, 2019

gopherbot commented Oct 23, 2019

gopherbot commented Oct 23, 2019

gopherbot commented Oct 23, 2019

gopherbot commented Oct 23, 2019

gopherbot commented Oct 23, 2019

gopherbot commented Oct 23, 2019

gopherbot commented Oct 23, 2019

gopherbot commented Oct 23, 2019

gopherbot commented Oct 23, 2019

gopherbot commented Oct 23, 2019

gopherbot commented Oct 23, 2019

gopherbot commented Oct 23, 2019

gopherbot commented Oct 23, 2019

mknyszek commented Oct 23, 2019

gopherbot commented Oct 24, 2019

gopherbot commented Oct 28, 2019

gopherbot commented Oct 28, 2019

un000 commented Oct 29, 2019 • edited Loading

mknyszek commented Oct 29, 2019

mknyszek commented Oct 29, 2019 • edited Loading

gopherbot commented Nov 8, 2019

gopherbot commented Nov 9, 2019

gopherbot commented Nov 9, 2019

gopherbot commented Nov 13, 2019

ianlancetaylor commented Dec 5, 2019

mknyszek commented Dec 5, 2019

gopherbot commented Feb 11, 2020

mknyszek commented Oct 23, 2019 •

edited

Loading

un000 commented Oct 29, 2019 •

edited

Loading

mknyszek commented Oct 29, 2019 •

edited

Loading