Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: reported Go 1.6 GC throughput problems #14161

Closed
bradfitz opened this issue Jan 30, 2016 · 8 comments
Closed

runtime: reported Go 1.6 GC throughput problems #14161

bradfitz opened this issue Jan 30, 2016 · 8 comments

Comments

@bradfitz
Copy link
Contributor

Konstantin Shaposhnikov and InfluxDB report on golang-dev that Go 1.4 is better than Go 1.5 or Go 1.6 for GC throughput.

Moving an email from Konstantin to a bug:

Hi,

I've collected profiling and gctrace output for json benchmark that I
had described in my original email. I used two m4.10xlarge EC2
instances for testing (40 cores, 60GB of RAM).

All results are here:
https://drive.google.com/file/d/0B9Oy8xv00g4Db2lta0h1WHVJN3c/view?usp=sharing.
In the archive you can find:

  • two binaries: hello15 (compiled with Go 1.5.3) and hello16 (compiled
    with Go 1.6rc1)
  • *.log files with GODEBUG=gctrace=1,schedtrace=1000 output
  • *.prof file with profiling data
  • *.log and *.prof files are named go<NUM_WRK_CONNECTIONS>.

The corresponding benchmarking results:
https://gist.github.com/kostya-sh/7ac7e52b4519694f5f4a and a chart:
https://docs.google.com/spreadsheets/d/1MfB_-lfvyXOXTKS0mdqabPgkHjXgMKFawwQbHOWfP4M/pubchart?oid=1386599290&format=image

I hope this data helps to identify the reason of the decreased throughput.

@bradfitz bradfitz added this to the Go1.6Maybe milestone Jan 30, 2016
@cespare
Copy link
Contributor

cespare commented Jan 30, 2016

@bradfitz I think that there are two issues here which should be two separate bugs (even if they have the same root cause or solution).

  • Konstantin Shaposhnikov is discussing 1.5 vs. 1.6 performance and GC scalability on large hardware
  • As discussed in this thread, the InfluxDB folks explained that they have throughput problems with 1.4 vs. 1.5+ and are currently pinned to 1.4 for production builds. The Influx folks have said they will file an issue and provide debugging info.

/cc @jwilder @toddboom @pauldix

@bradfitz
Copy link
Contributor Author

@cespare, I'm happy with 2 bugs or 10 bugs. As long as we don't have zero bugs.

@aclements
Copy link
Member

@cespare, I would lean toward filing these as two separate bugs, since one is primarily about scalability and the other about throughput. They solutions may be related, but I wouldn't be surprised if they're not exactly the same.

@toddboom
Copy link

I agree - opening two issues/bugs seems like the right approach here. I’ll
try to get you all the info for what we’ve observed with InfluxDB by the
end of the day today and open a new issue to track it.

Todd

On Sat, Jan 30, 2016 at 10:14 AM, Austin Clements notifications@github.com
wrote:

@cespare https://github.com/cespare, I would lean toward filing these
as two separate bugs, since one is primarily about scalability and the
other about throughput. They solutions may be related, but I wouldn't be
surprised if they're not exactly the same.


Reply to this email directly or view it on GitHub
#14161 (comment).

@RLH
Copy link
Contributor

RLH commented Jan 30, 2016

This addresses Konstantin numbers.

From the narrative and the gctrace logs it looks like the program is using 40 HW threads (GOMAXPROCS=40) on a 40 core machine with a target heap size close to 4 - 5 MB at GOGC=100 up to 33MB for GOGC=800. This is on a machine with 160 GBs of RAM. The GC is doing a fine job keeping the heap size at the default minimum of 4 MB or, for GOGC=800, 33 MB. Throughput could be improved by allowing the application to use more of the available RAM. The increase from GOGC=100 to the GOGC=800 reveals how effective this strategy can be. The machine should have no problem supporting heaps many orders of magnitude.

Perhaps the default minimum heap size should be increased on machines with lots of cores and lots of RAM. For machines with > 8 cores even 100 MB * GOMAXPROCS (# of HW threads) seems modest.

@kostya-sh
Copy link
Contributor

The most obvious difference between Go 1.5 and Go 1.6 GC traces is difference in number of GCs. For example in the test with GOGC=100 and 128 connections Go 1.5 performed 1518 GCs vs 3498 using 1.6.

The following charts illustrate this very well:

@aclements
Copy link
Member

The Go 1.6 GC is much better about hitting its target heap size than the Go 1.5 GC was, especially at small heap sizes. This is a good thing, but this particular program happens to be using a very small heap and it was very good at overshooting the target on 1.5, so this bug was working to its advantage to reduce the number of GCs.

Below are CDFs of the overshoot for go15_100_128.log and go16_100_128.log. The red area highlights what fraction of GC cycles overshot the heap and the X axis shows how much the overshoot was.

go15_100_128 log go16_100_128 log

I'm certainly not going to make Go worse at hitting the heap target. :) I also don't want to optimize for this benchmark at the expense of other programs because it seems quite artificial to me; I wouldn't expect "real" applications to be running on 40 cores, handling a large amount of load, and maintaining a 5 MB heap. In general, you expect a program's heap to scale with its load and any program handling a non-trivial load to easily exceed the minimum target. Likewise, I'm reluctant to raise the minimum heap target because it's only expected to apply to small programs (and it's meant to keep small programs small).

So, I suppose my conclusion is "working as intended" unless there are specific counter-proposals or compelling evidence that we need to do something different for heavy loads on small heaps.

@kostya-sh
Copy link
Contributor

@aclements, I agree that keeping promise about the heap target is an improvement that shouldn't be sacrificed to achieve better results in this artificial benchmark.

If a developer notices that GC happens too frequently then she can increase GOGC value to improve performance. It would be even better if the Go runtime could do it automatically. And this benchmark gives some ideas how to autotune GOGC.

An interesting observation is that scaling stopped when GC frequency was somewhere between 80 and 180 times per second (between 32 and 64 threads, 1141 to 2703 GCs in 15 sec). The exact number probably depends on the heap size as well. I wonder if the Go runtime can be clever enough and increase the heap size if GC happens too frequently?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants