runtime: throttle heap growth #14735

dvyukov · 2016-03-09T18:46:12Z

GC goal is calculated as:

goal = reachable * (1 + GOGC/100)

This works good in steady state and it is not a problem when heap is shrinking. But it is a problem when reachable heap size has episodic short-term spikes.

Consider that typical reachable heap size is 1GB, so heap grows to 2GB (provided GOGC=100). Now consider that there is a short spike of reachable heap size to 1.5GB. If GC is triggered after the spike, everything is fine. However if we are unlucky and GC observes 1.5GB of reachable heap, it will set goal to 3GB. And at this point we will necessary consume these 3GB and increase RSS by 1.5x unnecessary.

The proposal is to throttle GC goal if we see that the new goal will increase max heap size. We can either use, say, GOGC/2 in such cases; or maybe set heap goal to current max heap size. In the above example, if we set goal to 2GB we can avoid growing heap at all. This way we pay a fixed amount of CPU during heap growth, but can get lower RSS in return. Note that heap growth can't continue infinitely, so the amount of additional CPU is limited.

FWIW client-side GC-based systems (most notably javascript), go to great effort to throttle heap growth in such scenarios. AFAIK, v8 uses something like GOGC=10% during growth just to not over-allocate.

The text was updated successfully, but these errors were encountered:

dvyukov · 2016-03-09T18:47:09Z

@RLH @aclements @randall77 @rsc @neild @bcmills @Sajmani

bcmills · 2016-03-09T18:55:06Z

The proposal is to throttle GC goal if we see that the new goal will increase max heap size.

What would GOGC mean in that context? It would no longer be the case that the next GC cycle triggers "when the ratio of freshly allocated data to live data remaining after the previous collection reaches this percentage."

Note that heap growth can't continue infinitely, so the amount of additional CPU is limited.

Don't we return unused pages to the OS (e.g. via MADV_DONTNEED) after a spike subsides? To cap the amount of additional CPU we'd probably need to stop doing that (which I'd argue is fine, but I would expect to encounter many users who vehemently disagree).

dvyukov · 2016-03-09T19:06:09Z

What would GOGC mean in that context?

I believe that nothing is changed for end-user. GOGC is expressed in terms of live heap during a GC, and you generally don't know when GC is triggered. With concurrent marking and sweeping, these things are blurred even more. Let's just pretend that GC has higher chances to be triggered at a more fortunate point.

Don't we return unused pages to the OS (e.g. via MADV_DONTNEED) after a spike subsides?

This is a good point. I feel that both things are useful and need to coexist. I don't have a ready answer.
Maybe it is OK as is, because memory is released to OS with a very low frequency. And if it was released we do want to try to keep RSS on that lower level.
Maybe it is OK because the additional CPU cost is not that high.
Maybe we can introduce even lower frequency metric to amortize the additional CPU cost.

aclements · 2016-03-09T19:35:07Z

Interesting. Are you basically proposing that we low-pass filter the heap goal? (Or at least low-pass filter it when it's going up.)

I'm curious for more details about v8's GC scheduling if you have them or have a pointer.

Don't we return unused pages to the OS (e.g. via MADV_DONTNEED) after a spike subsides? To cap the amount of additional CPU we'd probably need to stop doing that (which I'd argue is fine, but I would expect to encounter many users who vehemently disagree).

If anything, we need to return memory to the OS more aggressively. In this case, returning it more aggressively will make a heap spike less costly in terms of RSS over time. But in general, there's no such thing as "unused memory". If it's not being used by an application, it's being used for something else like OS file caches or, on mobile, whole other applications. There's obviously a balance to be struck since we need to amortize the cost of returning memory to the OS and then potentially asking for it back. We don't want to return a page the instant we're not using it, but there is most certainly value in returning memory to the OS.

rsc · 2016-03-10T05:15:42Z

@dvyukov, what you are describing is the exact opposite of the problem I see most often when looking at GC overheads. What I see is programs like the compiler that tend to have long phases where they mostly allocate and don't give up much garbage from those allocations. In those phases, you'd ideally want the goal to be even higher, because the GC isn't going to reclaim much anyway. What you're describing would move the goal in the opposite direction, making it grow even more slowly and therefore making GC cost even more. That is, you are optimizing space at the cost of increased GC overhead. Often people want the opposite.

I am not convinced we should change the goal calculation. There are clearly scenarios where it's too big and where it's too small, but at least it's understandable, and GOGC gives control in the cases where it absolutely needs changing.

dvyukov · 2016-03-10T09:47:58Z

Interesting. Are you basically proposing that we low-pass filter the heap goal? (Or at least low-pass filter it when it's going up.)

Yes, low-pass filter when heap is growing.

I'm curious for more details about v8's GC scheduling if you have them or have a pointer.

Can't find anything now. Probably it was just personal communication.

If anything, we need to return memory to the OS more aggressively.

Completely agree.
If the additional GC cost turns out to be noticeable, it seems to me that introducing an additional lower-frequency metric (max RSS during last X minutes) will allow to reduce the overhead to any target level.

@dvyukov, what you are describing is the exact opposite of the problem I see most often when looking at GC overheads.

I wonder if we can do heap growth throttling for long running server and client programs without compromising compiler... I looked at compiler gctrace, and the steady growth is easily recognizable. So the simplest heuristic would be to throttle gc goal once, if during the next gc reachable memory continues to grow, give up with throttling.

and GOGC gives control in the cases where it absolutely needs changing.

The absolutely need is more-or-less always related to reduction of GOGC value (you have infinite amount of CPU time, but limited memory). So temporary reducing GOGC should not be an issue. And the throttling should correlate with GOGC (e.g. GOGC/2), so you still can control it with GOGC.

FTR, here are compiler traces for net package with different values of GOGC:

GOGC=100
========

0.94user 0.05system 0:00.58elapsed 169%CPU (0avgtext+0avgdata 66404maxresident)k
0.99user 0.05system 0:00.58elapsed 179%CPU (0avgtext+0avgdata 66720maxresident)k
0.88user 0.15system 0:00.55elapsed 186%CPU (0avgtext+0avgdata 67256maxresident)k
0.85user 0.15system 0:00.56elapsed 178%CPU (0avgtext+0avgdata 67084maxresident)k
0.97user 0.14system 0:00.58elapsed 190%CPU (0avgtext+0avgdata 66928maxresident)k


gc 1 @0.004s 1%: 0.12+2.8+0.35 ms clock, 0.51+0.38/3.7/1.5+1.4 ms cpu, 4->4->3 MB, 5 MB goal, 48 P
gc 2 @0.015s 2%: 0.12+2.4+0.45 ms clock, 1.2+0.25/7.1/2.8+4.5 ms cpu, 6->7->5 MB, 7 MB goal, 48 P
gc 3 @0.031s 2%: 0.048+2.6+0.43 ms clock, 0.88+0.39/12/5.8+7.7 ms cpu, 10->11->9 MB, 11 MB goal, 48 P
gc 4 @0.069s 2%: 0.043+2.9+0.44 ms clock, 0.87+1.8/22/19+8.9 ms cpu, 17->18->15 MB, 19 MB goal, 48 P
gc 5 @0.135s 1%: 0.043+4.4+0.39 ms clock, 1.2+2.0/28/20+11 ms cpu, 29->30->20 MB, 31 MB goal, 48 P
gc 6 @0.233s 1%: 0.052+6.1+0.41 ms clock, 1.6+2.5/37/14+12 ms cpu, 39->40->25 MB, 41 MB goal, 48 P
gc 7 @0.356s 1%: 0.017+5.5+0.34 ms clock, 0.56+2.7/46/23+11 ms cpu, 49->49->29 MB, 50 MB goal, 48 P
gc 8 @0.484s 1%: 0.023+8.0+0.50 ms clock, 0.75+4.8/60/24+16 ms cpu, 58->59->37 MB, 59 MB goal, 48 P

GOGC=50
=======

1.33user 0.06system 0:00.62elapsed 225%CPU (0avgtext+0avgdata 58680maxresident)k
1.43user 0.05system 0:00.63elapsed 233%CPU (0avgtext+0avgdata 58128maxresident)k
1.40user 0.07system 0:00.62elapsed 236%CPU (0avgtext+0avgdata 59396maxresident)k
1.45user 0.08system 0:00.61elapsed 249%CPU (0avgtext+0avgdata 56272maxresident)k
1.37user 0.05system 0:00.62elapsed 230%CPU (0avgtext+0avgdata 60124maxresident)k

gc 1 @0.001s 1%: 0.16+2.6+0.27 ms clock, 0.48+0.017/2.6/0.58+0.83 ms cpu, 3->4->3 MB, 4 MB goal, 48 P
gc 2 @0.005s 2%: 0.043+1.9+0.27 ms clock, 0.39+0.30/2.7/0.72+2.4 ms cpu, 4->4->4 MB, 5 MB goal, 48 P
gc 3 @0.008s 3%: 0.031+2.1+0.30 ms clock, 0.41+0.32/4.0/0.69+4.0 ms cpu, 4->5->4 MB, 5 MB goal, 48 P
gc 4 @0.012s 4%: 0.050+2.2+0.30 ms clock, 0.81+0.23/5.6/1.0+4.8 ms cpu, 5->5->5 MB, 6 MB goal, 48 P
gc 5 @0.018s 4%: 0.031+2.4+0.29 ms clock, 0.62+0.49/6.9/1.7+5.8 ms cpu, 6->7->6 MB, 7 MB goal, 48 P
gc 6 @0.025s 4%: 0.034+2.6+0.32 ms clock, 0.80+0.51/9.5/3.7+7.5 ms cpu, 8->9->7 MB, 9 MB goal, 48 P
gc 7 @0.035s 4%: 0.042+3.2+0.32 ms clock, 1.1+0.61/10/4.4+8.4 ms cpu, 10->11->10 MB, 11 MB goal, 48 P
gc 8 @0.051s 4%: 0.036+3.4+0.31 ms clock, 1.0+0.73/14/9.3+8.7 ms cpu, 14->14->13 MB, 15 MB goal, 48 P
gc 9 @0.076s 3%: 0.030+4.2+0.39 ms clock, 0.92+1.9/22/14+11 ms cpu, 19->19->18 MB, 20 MB goal, 48 P
gc 10 @0.108s 3%: 0.011+3.9+0.35 ms clock, 0.35+2.0/21/27+11 ms cpu, 25->26->19 MB, 26 MB goal, 48 P
gc 11 @0.159s 2%: 0.025+3.8+0.37 ms clock, 0.82+1.3/37/8.3+12 ms cpu, 29->29->22 MB, 30 MB goal, 48 P
gc 12 @0.218s 2%: 0.024+21+0.43 ms clock, 0.79+1.4/39/1.8+13 ms cpu, 32->32->24 MB, 33 MB goal, 48 P
gc 13 @0.298s 2%: 0.012+5.0+0.40 ms clock, 0.40+2.6/40/27+13 ms cpu, 35->36->26 MB, 36 MB goal, 48 P
gc 14 @0.366s 2%: 0.012+6.8+0.41 ms clock, 0.40+2.6/42/32+13 ms cpu, 39->39->29 MB, 40 MB goal, 48 P
gc 15 @0.440s 2%: 0.026+9.2+0.45 ms clock, 0.85+5.3/61/8.1+14 ms cpu, 43->43->32 MB, 44 MB goal, 48 P
gc 16 @0.519s 2%: 0.014+7.0+0.43 ms clock, 0.44+3.7/49/49+14 ms cpu, 48->48->36 MB, 49 MB goal, 48 P

GOGC=30
=======

1.85user 0.06system 0:00.65elapsed 294%CPU (0avgtext+0avgdata 53980maxresident)k
1.64user 0.12system 0:00.65elapsed 267%CPU (0avgtext+0avgdata 54772maxresident)k
1.89user 0.05system 0:00.68elapsed 285%CPU (0avgtext+0avgdata 56444maxresident)k
1.74user 0.07system 0:00.66elapsed 274%CPU (0avgtext+0avgdata 54508maxresident)k
1.65user 0.04system 0:00.66elapsed 255%CPU (0avgtext+0avgdata 55444maxresident)k

gc 1 @0.002s 1%: 0.20+2.3+0.17 ms clock, 0.62+0.14/2.8/0.83+0.53 ms cpu, 3->3->3 MB, 4 MB goal, 48 P
gc 2 @0.005s 2%: 0.059+1.9+0.37 ms clock, 0.53+1.5/0.90/0.51+3.4 ms cpu, 3->3->3 MB, 4 MB goal, 48 P
gc 3 @0.008s 3%: 0.052+2.1+0.36 ms clock, 0.57+0.48/3.4/0.60+4.0 ms cpu, 3->4->3 MB, 4 MB goal, 48 P
gc 4 @0.012s 4%: 0.054+2.4+0.40 ms clock, 0.76+0.23/4.2/0.77+5.6 ms cpu, 4->4->4 MB, 5 MB goal, 48 P
gc 5 @0.016s 4%: 0.065+2.7+0.29 ms clock, 1.1+0.16/7.0/1.1+5.0 ms cpu, 4->5->4 MB, 5 MB goal, 48 P
gc 6 @0.022s 5%: 0.054+2.7+0.40 ms clock, 1.0+0.41/9.0/2.3+7.6 ms cpu, 5->6->5 MB, 6 MB goal, 48 P
gc 7 @0.028s 5%: 0.059+3.2+0.40 ms clock, 1.3+0.62/10/5.3+8.9 ms cpu, 6->7->6 MB, 7 MB goal, 48 P
gc 8 @0.034s 5%: 0.052+3.0+0.45 ms clock, 1.4+0.15/12/6.8+12 ms cpu, 7->8->7 MB, 8 MB goal, 48 P
gc 9 @0.043s 6%: 0.040+2.9+0.44 ms clock, 1.1+0.38/15/0.73+12 ms cpu, 9->9->8 MB, 10 MB goal, 48 P
gc 10 @0.054s 6%: 0.051+3.8+0.40 ms clock, 1.5+1.4/24/30+12 ms cpu, 10->11->10 MB, 11 MB goal, 48 P
gc 11 @0.068s 6%: 0.012+3.7+0.42 ms clock, 0.40+1.0/17/22+13 ms cpu, 13->13->12 MB, 14 MB goal, 48 P
gc 12 @0.089s 5%: 0.014+3.9+0.42 ms clock, 0.44+1.9/23/33+13 ms cpu, 16->16->15 MB, 17 MB goal, 48 P
gc 13 @0.114s 5%: 0.014+4.0+0.42 ms clock, 0.45+2.3/24/17+13 ms cpu, 19->20->18 MB, 20 MB goal, 48 P
gc 14 @0.141s 4%: 0.015+5.1+0.40 ms clock, 0.50+2.8/33/19+13 ms cpu, 23->24->19 MB, 24 MB goal, 48 P
gc 15 @0.179s 4%: 0.022+3.7+0.46 ms clock, 0.73+1.4/33/29+14 ms cpu, 25->26->20 MB, 26 MB goal, 48 P
gc 16 @0.213s 4%: 0.014+5.2+0.47 ms clock, 0.47+2.0/32/44+15 ms cpu, 27->27->22 MB, 28 MB goal, 48 P
gc 17 @0.259s 3%: 0.014+7.5+0.45 ms clock, 0.47+2.0/36/22+14 ms cpu, 29->29->24 MB, 30 MB goal, 48 P
gc 18 @0.302s 3%: 0.020+22+0.44 ms clock, 0.66+3.1/36/3.2+14 ms cpu, 31->31->25 MB, 32 MB goal, 48 P
gc 19 @0.366s 3%: 0.021+5.4+0.48 ms clock, 0.70+1.5/44/13+15 ms cpu, 33->34->27 MB, 34 MB goal, 48 P
gc 20 @0.418s 3%: 0.014+9.9+0.41 ms clock, 0.46+6.6/57/2.8+13 ms cpu, 35->35->28 MB, 36 MB goal, 48 P
gc 21 @0.477s 3%: 0.012+4.1+0.42 ms clock, 0.40+3.0/41/37+13 ms cpu, 37->38->30 MB, 38 MB goal, 48 P
gc 22 @0.525s 3%: 0.022+6.8+0.53 ms clock, 0.70+3.9/47/29+17 ms cpu, 40->40->33 MB, 41 MB goal, 48 P
gc 23 @0.587s 3%: 0.014+5.9+0.41 ms clock, 0.46+1.5/53/73+13 ms cpu, 43->44->35 MB, 44 MB goal, 48 P
gc 24 @0.639s 3%: 0.023+7.6+0.54 ms clock, 0.74+4.2/62/30+17 ms cpu, 46->47->38 MB, 47 MB goal, 48 P

dvyukov · 2016-03-10T09:56:28Z

The opposite optimization can also make sense, and would allow to win back some CPU time.
Consider that reachable memory is 1 GB and heap size is 2 GB. Now during the next GC we discover reachable memory to be 800 MB, and we would set next goal to 1.6 GB. But all that 2GB is still hot, so we could as well set the goal still to 2 GB and reduce GC overhead by 20%. Of course, if reachable heap continues to reduce or stays at 800 MB, then we need to reduce goal over time.

If we have a long-running app where reachable heap fluctuates between 800MB and 1GB, RSS still will be at 2GB. But if we set GC goal to 2 GB most of the time, we can save 10% of GC overhead on average continuously.

minux · 2016-03-10T10:01:16Z

Is it possible to adjust heap goal according to current
system memory pressure (but capped by some user
setting, e.g. GOGC)?

(This is because saving CPU and saving memory are
almost fundamentally incompatible goals for a GC.
As a concrete example, on low RAM builder (with 512MB
physical RAM), I have to set GOGC to as low as 10%,
otherwise all.bash will swap like crazy, but on a developer
workstation, I'd rather reduce build time by disabling GC
for the go toolchain.)

For example, we can query getrusage(2) periodically,
and reduce the heap goal if we triggered any swaps
during the last cycle?

aclements · 2016-03-10T17:27:19Z

Is it possible to adjust heap goal according to current
system memory pressure (but capped by some user
setting, e.g. GOGC)?

IMO, this again falls prey to the fallacy of thinking of memory as "unused." All memory is used all of the time, just for different things. Having an application use all of the system's memory is never the best policy. As an extreme example, at the very least you want to leave space for the application's own binary in the OS file cache; if you put the system under more pressure, the application will slow down, even though you're giving it more memory. In most applications there are other things in memory that are helping out the application, but don't count toward its RSS. I have no idea how to compute the optimal trade-off here, hence heuristics that try to keep the memory footprint within the realm of the minimum possible footprint.

Hierarchical memory management is actually an area I feel should be ripe for research, but I've never seen anything on it. When different layers can adjust their resource usage in different ways, it really feels like there should be a cooperative way to iterate toward an optimum.

For example, we can query getrusage(2) periodically,
and reduce the heap goal if we triggered any swaps
during the last cycle?

Linux doesn't actually report this. Even if it did, a lot of systems (particularly server systems) will kill your process before you ever start swapping. And even if the system does swap, this won't tell you if your application is causing other applications to swap.

RLH · 2016-03-10T18:58:33Z

A spike in memory usage could also signal a phase change in the program.
Similarly to the start of a program, a phase change is probably where the
GC should be more aggressive about increasing memory size. At the end of
the day the only thing that makes sense is what Minux is suggesting and
that is to somehow guess how much RAM and CPU power is actually available
and use that as input to the pacer. To use Dmitry's trace as an example, if
we have a 48 core machine then that could be used as an indication that the
initial 4MB heap size should be increased more aggressively than if
GOMAXPROCS is set to 1. The other metric that has been discussed as an
input to the pacer is how many GCs are being performed in a second. If we
are doing a lot of GCs then increase heap size more aggressively.

As long as the user is running with the default GOMAXPROCS and GOGC then
I'm comfortable using these heuristics to improve the Go experience. As
soon as the knobs are turned then we want to provide a heap model that is
as simple as possible which means that these heuristics will need to be
turned off.

On Thu, Mar 10, 2016 at 5:01 AM, Minux Ma notifications@github.com wrote:

Is it possible to adjust heap goal according to current
system memory pressure (but capped by some user
setting, e.g. GOGC)?

(This is because saving CPU and saving memory are
almost fundamentally incompatible goals for a GC.
As a concrete example, on low RAM builder (with 512MB
physical RAM), I have to set GOGC to as low as 10%,
otherwise all.bash will swap like crazy, but on a developer
workstation, I'd rather reduce build time by disabling GC
for the go toolchain.)

For example, we can query getrusage(2) periodically,
and reduce the heap goal if we triggered any swaps
during the last cycle?

—
Reply to this email directly or view it on GitHub
#14735 (comment).

bradfitz added this to the Unplanned milestone Apr 9, 2016

emidoots mentioned this issue May 31, 2017

significantly lower memory consumption for vscode-go users sourcegraph/go-langserver#192

Merged

raulk mentioned this issue Nov 24, 2020

runtime: ill-fated GC outcome in spiky/bursty scenarios #42805

Open

vmg mentioned this issue Sep 28, 2021

Memory Pressure / Load Shedding vitessio/vitess#8897

Open

gopherbot added the compiler/runtime label Jul 7, 2022

mknyszek added this to Go Compiler / Runtime Jul 7, 2022

mknyszek removed this from Go Compiler / Runtime Jul 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: throttle heap growth #14735

runtime: throttle heap growth #14735

dvyukov commented Mar 9, 2016

dvyukov commented Mar 9, 2016

bcmills commented Mar 9, 2016

dvyukov commented Mar 9, 2016

aclements commented Mar 9, 2016

rsc commented Mar 10, 2016

dvyukov commented Mar 10, 2016

dvyukov commented Mar 10, 2016

minux commented Mar 10, 2016

aclements commented Mar 10, 2016

RLH commented Mar 10, 2016

runtime: throttle heap growth #14735

runtime: throttle heap growth #14735

Comments

dvyukov commented Mar 9, 2016

dvyukov commented Mar 9, 2016

bcmills commented Mar 9, 2016

dvyukov commented Mar 9, 2016

aclements commented Mar 9, 2016

rsc commented Mar 10, 2016

dvyukov commented Mar 10, 2016

dvyukov commented Mar 10, 2016

minux commented Mar 10, 2016

aclements commented Mar 10, 2016

RLH commented Mar 10, 2016