runtime: GC causes latency spikes with single processor #18534

tarm · 2017-01-06T08:39:15Z

(This may be the same as #14812, but the discussion for that issue has veered towards sync.Pool and there are many processors available in the traces people have posted to that issue.)

What version of Go are you using (`go version`)?

go1.7.4 Testing with the latest master branch does not make a difference for my real application.

What operating system and processor architecture are you using (`go env`)?

Linux on an x64 single-core 1.4GHz Atom processor

What did you do?

It does not run on the playground (takes too long and cannot write trace file), but here's a minimal test case that reproduces the issue
https://play.golang.org/p/aJmmumzKpD
(original example that was slightly more complicated: https://play.golang.org/p/_tQTfBR33g)

Specifically, that reproduces the approximate amount of memory, GC frequency, GC irregularity that I see in our application. Concretely, this reproduction test case has GCs that collect 50-100MB but also sometimes cause application delays with long tails on the order of 100ms.

To mimic the single core processor from my laptop, I was running the test with CPU affinity like this: "GODEBUG=gcpacertrace=1,gctrace=1 taskset -c 0 ./main"

What did you expect to see?

The main goroutine is driven off of a 100ms tick. Ideally that tick would arrive every 100ms and then the main goroutine would run immediately after the tick then and be done.

What did you see instead?

Sometimes the tick gets delayed and sometimes the main goroutine gets delayed even after the tick fires. Measuring the deviation away from the expected 100ms interval, I see a latency distribution like this:

  0.0% -64.359857ms
  1.0% -41.943823ms
  5.0% -19.403262ms
  50.0% 121.887µs
  95.0% 20.226485ms
  99.0% 40.27582ms
  100.0% 75.602625ms

The mean is as expected, but the 75ms deviation is much higher than we want to see. (The negative values are the ticker catching up after previous delays).

The gctrace output before the 75ms delay looks like this:

gc 79 @38.948s 10%: 0.099+133+0.59 ms clock, 0.099+78/33/0+0.59 ms cpu, 72->78->70 MB, 78 MB goal, 1 P
pacer: sweep done at heap size 76MB; allocated 6MB of spans; swept 10013 pages at +2.094311e-004 pages/byte
pacer: assist ratio=+4.354140e+000 (scan 103 MB in 116->140 MB) workers=0+1
Higher latency than expected at iteration 395.  Tick was delayed 26.735579ms
pacer: H_m_prev=73777328 h_t=+6.622503e-001 H_T=122636387 h_a=+7.576055e-001 H_a=129671440 h_g=+1.000000e+000 H_g=147554656 u_a=+4.951149e-001 u_g=+2.500000e-001 W_
a=51445176 goalΔ=+3.377497e-001 actualΔ=+9.535521e-002 u_a/u_g=+1.980459e+000
gc 80 @39.688s 10%: 0.11+118+0.26 ms clock, 0.11+29/30/28+0.26 ms cpu, 116->123->55 MB, 140 MB goal, 1 P
pacer: sweep done at heap size 55MB; allocated 0MB of spans; swept 15855 pages at +3.785020e-004 pages/byte
pacer: assist ratio=+5.837807e+000 (scan 85 MB in 96->111 MB) workers=0+1
pacer: H_m_prev=58283288 h_t=+7.367016e-001 H_T=101220679 h_a=+7.406318e-001 H_a=101449744 h_g=+1.000000e+000 H_g=116566576 u_a=+2.500000e-001 u_g=+2.500000e-001 W_
a=34801104 goalΔ=+2.632984e-001 actualΔ=+3.930191e-003 u_a/u_g=+1.000000e+000
gc 81 @40.666s 9%: 0.11+56+0.20 ms clock, 0.11+0/16/39+0.20 ms cpu, 96->96->37 MB, 111 MB goal, 1 P
pacer: sweep done at heap size 37MB; allocated 0MB of spans; swept 12413 pages at +3.744516e-004 pages/byte
pacer: assist ratio=+1.236822e+001 (scan 62 MB in 70->75 MB) workers=0+1
Higher latency than expected at iteration 409.  Tick was delayed 75.602625ms

My lay-person reading of the trace is that the "child" garbage-generating goroutines are asked to assist with the GC, but then they end up blocking execution of both the runtime.timerproc goroutine and the main goroutine.

This seems like a scheduler issue and goroutines that generate minimal garbage should be prioritized over goroutines assisting with the GC.

gctrace=1,gcpacertrace=1 output

runtime trace The 74ms delay occurs around 41 seconds into the trace. The tick should have fired at 41003ms but did not fire until 41079ms and the G1 goroutine did not run until 41140ms. The complete trace is split up into multiple chunks, but this GC can be found 12.6s into its chunk.

The text was updated successfully, but these errors were encountered:

tarm · 2017-01-31T03:02:29Z

@RLH @aclements @dvyukov Is this on your radar for go1.9? Have you had a chance to try the example code yet?

I tried the patches that Austin posted for #14812, but those did not resolve the issue in the example that I posted. I also tried playing around with more scheduler options in the runtime, but I'm not familiar with the GC and scheduler and none of the basic "reschedule now" changes that I made seemed to help.

quentinmit · 2017-01-31T03:33:16Z

Your taskset invocation does not accurately model the behavior of a single CPU system. In particular, Go is still going to use a larger GOMAXPROCS, which is actively going to make this worse by trying to schedule multiple threads on a single physical core.

I think you should be able to get closer by setting GOMAXPROCS=1 when running the test, in addition to taskset. What does that do to the latency profile?

minux · 2017-01-31T03:38:54Z

On Linux, the default NumCPU and GOMAXPROCS is determined with sched_getaffinity so I think the command does emulate a one-cpu system.

tarm · 2017-03-18T22:40:41Z

I am also able to reproduce this issue on darwin amd64 with go1.8 and GOMAXPROCS=1.

$ GOMAXPROCS=1 ./bug18534
  0.0% -68.680343ms
  1.0% -36.249343ms
  5.0% -21.846205ms
  50.0% 33.946µs
  95.0% 21.279699ms
  99.0% 35.011957ms
  100.0% 58.775745ms
$ ./bug18534             # 4 cores, 8 procs
  0.0% -6.406381ms
  1.0% -4.595611ms
  5.0% -3.099409ms
  50.0% -22.001µs
  95.0% 3.246163ms
  99.0% 4.806314ms
  100.0% 7.189159ms

The scheduler/GC seems to need those extra cores to keep latency down.

tarm · 2017-05-19T05:48:34Z

Some more data points:

$ taskset -c 0 ./issue18534 
  0.0% -70.533562ms
  1.0% -52.267319ms
  5.0% -35.006899ms
  50.0% -2.676µs
  95.0% 36.313112ms
  99.0% 57.591539ms
  100.0% 86.679896ms

GOMAXPROCS=2 taskset -c 0 ./issue18534 
  0.0% -61.117609ms
  1.0% -31.714017ms
  5.0% -21.586722ms
  50.0% 949ns
  95.0% 20.895062ms
  99.0% 33.908325ms
  100.0% 62.686614ms

$ GOMAXPROCS=4 taskset -c 0 ./issue18534 
  0.0% -33.988405ms
  1.0% -26.603924ms
  5.0% -11.102534ms
  50.0% 1.049µs
  95.0% 10.910585ms
  99.0% 23.701329ms
  100.0% 34.930726ms

$ GOMAXPROCS=5 taskset -c 0 ./issue18534
  0.0% -41.173521ms
  1.0% -15.659458ms
  5.0% -7.67135ms
  50.0% 70ns
  95.0% 7.62678ms
  99.0% 15.607855ms
  100.0% 41.169155ms

$ GOMAXPROCS=8 taskset -c 0 ./issue18534 
  0.0% -17.46882ms
  1.0% -10.194657ms
  5.0% -5.408951ms
  50.0% -315ns
  95.0% 5.410479ms
  99.0% 10.139459ms
  100.0% 16.585539ms

$ GOMAXPROCS=16 taskset -c 0 ./issue18534 
  0.0% -21.709796ms
  1.0% -8.175178ms
  5.0% -4.094512ms
  50.0% -173ns
  95.0% 4.09989ms
  99.0% 8.111605ms
  100.0% 21.653956ms

(top indicates that taskset is constraining the process to 1 core as expected.)

My understanding is that setting GOMAXPROCS will force the runtime to create more threads. Even when those threads are running on the same core and have to be scheduled by the OS (linux in this case), that seems to lead to lower latency than just letting the Go runtime schedule a single thread.

Here are some measurements without taskset, but with different values for GOMAXPROCS:

$  ./issue18534 # 40 logical cores, GOMAXPROCS set by default
  0.0% -8.913432ms
  1.0% -103.472µs
  5.0% -51.566µs
  50.0% -56ns
  95.0% 50.329µs
  99.0% 107.945µs
  100.0% 8.917999ms

$ GOMAXPROCS=4 ./issue18534 
  0.0% -16.824243ms
  1.0% -2.714634ms
  5.0% -80.119µs
  50.0% -358ns
  95.0% 82.118µs
  99.0% 2.521557ms
  100.0% 16.79671ms

$ GOMAXPROCS=5 ./issue18534 
  0.0% -11.703387ms
  1.0% -205.105µs
  5.0% -72.006µs
  50.0% -1.152µs
  95.0% 72.07µs
  99.0% 181.71µs
  100.0% 11.744029ms

$ GOMAXPROCS=4 ./issue18534 
  0.0% -13.447356ms
  1.0% -5.392172ms
  5.0% -89.006µs
  50.0% -401ns
  95.0% 90.569µs
  99.0% 5.325731ms
  100.0% 13.443492ms

$ GOMAXPROCS=5 ./issue18534 
  0.0% -10.005695ms
  1.0% -205.612µs
  5.0% -68.098µs
  50.0% 241ns
  95.0% 64.149µs
  99.0% 186.699µs
  100.0% 10.021666ms

$ GOMAXPROCS=8 ./issue18534 
  0.0% -180.467µs
  1.0% -106.693µs
  5.0% -56.36µs
  50.0% -228ns
  95.0% 56.368µs
  99.0% 91.327µs
  100.0% 213.66µs

It's interesting that there is a consistent change in the 99% latency between GOMAXPROCS=4 and =5. I suspect that is an artifact of some decision within the runtime about the number of idle GC workers, but have not looked at a trace. A similar looking change can also be seen in the taskset samples when going from GOMAXPROCS=4->5.

To me, these samples point to a deficiency in the Go runtime's handling of scheduling/preemption during GC. That issue is probably hidden on systems with more than 4 cores, which many server systems have these days.

RLH · 2017-05-19T14:58:56Z

The delays do not correlate with the GCs. I lowered the frequency of the GCs by setting GOGC=4000 thus trading RAM for fewer GCs. I also printed when the Tick was delayed so we could correlate with the gctrace time indicating when the GC ran. The resulting output is below and shows that the GC is not running when the tick delays are detected. Furthermore the gctrace seems to indicate that STW portions of the GC are well below 1 millisecond. My guess is that you have 11 goroutines running and occasionally the scheduler doesn't get around to the goroutine that services the tick. This is on tip. rlh@rlh:~/work/code/bugs/latencyBug/src$ GOGC=4000 GODEBUG=gctrace=1 GOMAXPROCS=1 ./latencyBug gc 1 @2.147s 0%: 0.011+77+0.009 ms clock, 0.011+0.46/20/21+0.009 ms cpu, 160->171->50 MB, 3498 MB goal, 1 P Higher latency @8.096342913s than expected at iteration 79. Tick was delayed 34.711117ms gc 2 @17.000s 0%: 0.047+121+0.010 ms clock, 0.047+5.6/32/28+0.010 ms cpu, 1078->1084->45 MB, 2068 MB goal, 1 P gc 3 @37.513s 0%: 0.011+86+0.004 ms clock, 0.011+18/25/12+0.004 ms cpu, 1410->1417->42 MB, 1858 MB goal, 1 P Higher latency @45.28716699s than expected at iteration 451. Tick was delayed 25.805553ms gc 4 @56.186s 0%: 0.015+91+0.010 ms clock, 0.015+43/22/0+0.010 ms cpu, 1534->1540->46 MB, 1752 MB goal, 1 P Higher latency @57.60534838s than expected at iteration 574. Tick was delayed 23.092542ms Higher latency @1m1.582175184s than expected at iteration 614. Tick was delayed 20.844583ms Higher latency @1m2.407811598s than expected at iteration 622. Tick was delayed 21.701094ms gc 5 @78.309s 0%: 0.008+67+0.007 ms clock, 0.008+33/20/2.2+0.007 ms cpu, 1773->1776->44 MB, 1899 MB goal, 1 P 0.0% -34.534406ms 1.0% -17.807413ms 5.0% -11.569303ms 50.0% -1.282µs 95.0% 11.480867ms 99.0% 17.951876ms 100.0% 34.711117ms

…

On Fri, May 19, 2017 at 1:48 AM, tarm ***@***.***> wrote: Some more data points: $ taskset -c 0 ./issue18534 0.0% -70.533562ms 1.0% -52.267319ms 5.0% -35.006899ms 50.0% -2.676µs 95.0% 36.313112ms 99.0% 57.591539ms 100.0% 86.679896ms GOMAXPROCS=2 taskset -c 0 ./issue18534 0.0% -61.117609ms 1.0% -31.714017ms 5.0% -21.586722ms 50.0% 949ns 95.0% 20.895062ms 99.0% 33.908325ms 100.0% 62.686614ms $ GOMAXPROCS=4 taskset -c 0 ./issue18534 0.0% -33.988405ms 1.0% -26.603924ms 5.0% -11.102534ms 50.0% 1.049µs 95.0% 10.910585ms 99.0% 23.701329ms 100.0% 34.930726ms $ GOMAXPROCS=5 taskset -c 0 ./issue18534 0.0% -41.173521ms 1.0% -15.659458ms 5.0% -7.67135ms 50.0% 70ns 95.0% 7.62678ms 99.0% 15.607855ms 100.0% 41.169155ms $ GOMAXPROCS=8 taskset -c 0 ./issue18534 0.0% -17.46882ms 1.0% -10.194657ms 5.0% -5.408951ms 50.0% -315ns 95.0% 5.410479ms 99.0% 10.139459ms 100.0% 16.585539ms $ GOMAXPROCS=16 taskset -c 0 ./issue18534 0.0% -21.709796ms 1.0% -8.175178ms 5.0% -4.094512ms 50.0% -173ns 95.0% 4.09989ms 99.0% 8.111605ms 100.0% 21.653956ms (top indicates that taskset is constraining the process to 1 core as expected.) My understanding is that setting GOMAXPROCS will force the runtime to create more threads. Even when those threads are running on the same core and have to be scheduled by the OS (linux in this case), that seems to lead to lower latency than just letting the Go runtime schedule a single thread. Here are some measurements without taskset, but with different values for GOMAXPROCS: $ ./issue18534 # 40 logical cores, GOMAXPROCS set by default 0.0% -8.913432ms 1.0% -103.472µs 5.0% -51.566µs 50.0% -56ns 95.0% 50.329µs 99.0% 107.945µs 100.0% 8.917999ms $ GOMAXPROCS=4 ./issue18534 0.0% -16.824243ms 1.0% -2.714634ms 5.0% -80.119µs 50.0% -358ns 95.0% 82.118µs 99.0% 2.521557ms 100.0% 16.79671ms $ GOMAXPROCS=5 ./issue18534 0.0% -11.703387ms 1.0% -205.105µs 5.0% -72.006µs 50.0% -1.152µs 95.0% 72.07µs 99.0% 181.71µs 100.0% 11.744029ms $ GOMAXPROCS=4 ./issue18534 0.0% -13.447356ms 1.0% -5.392172ms 5.0% -89.006µs 50.0% -401ns 95.0% 90.569µs 99.0% 5.325731ms 100.0% 13.443492ms $ GOMAXPROCS=5 ./issue18534 0.0% -10.005695ms 1.0% -205.612µs 5.0% -68.098µs 50.0% 241ns 95.0% 64.149µs 99.0% 186.699µs 100.0% 10.021666ms $ GOMAXPROCS=8 ./issue18534 0.0% -180.467µs 1.0% -106.693µs 5.0% -56.36µs 50.0% -228ns 95.0% 56.368µs 99.0% 91.327µs 100.0% 213.66µs It's interesting that there is a consistent change in the 99% latency between GOMAXPROCS=4 and =5. I suspect that is an artifact of some decision within the runtime about the number of idle GC workers, but have not looked at a trace. A similar looking change can also be seen in the taskset samples when going from GOMAXPROCS=4->5. To me, these samples point to a deficiency in the Go runtime's handling of scheduling/preemption during GC. That issue is probably hidden on systems with more than 4 cores, which many server systems have these days. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#18534 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA7WnweLAPHJQai8FiqCIFwIX5ICu5aIks5r7S1BgaJpZM4Lcgtv> .

aclements · 2017-07-18T22:06:15Z

I re-ran your reproducer with current master, which annotates mark assists in the execution trace, and added a go badness() just before the "high latency" print so I'd have something to search for in the execution trace.

This does appear to be caused by GC and scheduling, though several things have to go wrong simultaneously. Here's the new trace I'm analyzing with tip.

The worst delay is a 57ms delay that gets reported at 1,571.288 ms into the trace. First, a fractional mark worker starts at 1,473.412 ms. Since there's just one P, this necessarily delays all other work. This is supposed to run for at most 10ms, but for reasons I don't fully understand runs for 20ms. This delays the timer goroutine, so six wakeups all happen at 1,493.612 ms when the fractional worker finishes. It so happens these are all childLoops that happened to randomly pick short sleeps. These childLoops each run for ~20ms, partly because they're doing a lot of assist work (#14951) and partly because they take a long time to run even when not assisting. Again, I'm not sure why this is 20ms instead of 10ms. Together, these consume the next 60ms of CPU time. The mainLoop wake up is supposed to happen fairly early in this 60ms window. Because these are saturating the CPU, again, the timer goroutine doesn't get to run, so another three timer wakeups pile up until 1,557.327 ms. One of these (it happens to be the last one serviced), is mainLoop.

I need to dig in to why all of these preemptions are happening at 20ms. My guess is that sysmon has fallen back to a 10ms tick and has to observe them in the same goroutine for two ticks in a row before preempting, which seems a bit bogus. I suspect fixing that and fixing the over-assist problem (#14951) would fix this issue.

gopherbot · 2017-08-29T18:50:30Z

Change https://golang.org/cl/59970 mentions this issue: runtime: separate soft and hard heap limits

gopherbot · 2017-08-29T18:50:31Z

Change https://golang.org/cl/59971 mentions this issue: runtime: reduce background mark work to from 25% to 20%

tarm · 2017-08-30T15:49:50Z

@aclements Thanks for your work on this!

I cherry-picked your 2 patches onto tip and ran our actual application (not just the test reproduction case in this ticket):

Go v1.9.0:

I0830 00:43:14.905531  196377 main.go:804] logic loop time distribution (10000 samples): min = 4.808µs, 50% <= 11.472µs, 90% <= 18.879µs, 99% <= 5.018271ms, 99.9% <= 8.561956ms, max = 99.805794ms
I0830 00:59:59.306973  196377 main.go:804] logic loop time distribution (10000 samples): min = 5.443µs, 50% <= 10.039µs, 90% <= 16.947µs, 99% <= 53.701µs, 99.9% <= 11.057286ms, max = 136.385102ms
I0830 01:16:46.312292  196377 main.go:804] logic loop time distribution (10000 samples): min = 5.516µs, 50% <= 15.208µs, 90% <= 20.33µs, 99% <= 5.425571ms, 99.9% <= 12.001578ms, max = 122.486144ms

Go tip (d77d4f5) + patches applied

I0830 00:19:54.003992  185790 main.go:804] logic loop time distribution (10000 samples): min = 4.541µs, 50% <= 14.558µs, 90% <= 18.88µs, 99% <= 30.912µs, 99.9% <= 6.240578ms, max = 25.673079ms
I0830 00:36:35.013456  185790 main.go:804] logic loop time distribution (10000 samples): min = 4.521µs, 50% <= 17.13µs, 90% <= 20.187µs, 99% <= 29.954µs, 99.9% <= 6.296957ms, max = 12.587176ms
I0830 00:53:18.005010  185790 main.go:804] logic loop time distribution (10000 samples): min = 4.688µs, 50% <= 10.847µs, 90% <= 17.855µs, 99% <= 26.167µs, 99.9% <= 5.468955ms, max = 7.078815ms
I0830 01:10:01.610305  185790 main.go:804] logic loop time distribution (10000 samples): min = 5.305µs, 50% <= 13.549µs, 90% <= 19.775µs, 99% <= 31.4µs, 99.9% <= 6.350999ms, max = 16.407586ms

The 99.9% latencies moved from about 10ms -> 6ms, but the worst case latencies (out of 10k samples) went from ~120ms -> ~15ms. That is huge improvement in the worst case mutator latencies and would be enough to resolve this issue for us.

The PRs are still marked as DO NOT SUBMIT until benchmarking is complete. I am eager to see them merged, so let me know if there are is more I can help with to move them along. (My real application does not create that much garbage, and so is not useful for benchmarking odd GC assist interactions.)

aclements · 2017-08-30T16:22:35Z

@tarm, that's great! Could you also check what effect the patches have on heap size and throughput?

You're running with GOMAXPROCS=1, right? There's still an issue with fractional workers running for a long time without preemption, which I suspect is responsible for the 15ms worst-case you saw. I just filed an issue focused on that problem (#21698).

tarm · 2017-08-31T04:23:04Z

For the recent benchmarking, I have been running with taskset -c 0 but that should constrain it to one core as well or better than GOMAXPROCS=1.

This application is a tick based application instead of a run-to-completion type application so throughput is more difficult to measure. Over about 12 hours of running, the accumulated CPU time (as reported through top) was about 1.4% higher in the version with the patches, but that also includes changes from v1.9.0 to tip, so some of that may not be from the patches.

The application avoids allocation and is not a heavy user of RAM, so is also not a great stressor of the heap size and we have not specifically instrumented for heap usage. That said, the memory usage reported through top was about 1% higher for the new version.

gopherbot · 2017-08-31T22:19:50Z

Change https://golang.org/cl/60790 mentions this issue: cmd/trace: add minimum mutator utilization (MMU) plot

gopherbot · 2017-10-05T18:45:38Z

Change https://golang.org/cl/68573 mentions this issue: runtime: preempt fractional worker after reaching utilization goal

Currently fractional workers run until preempted by the scheduler, which means they typically run for 20ms. During this time, all other goroutines on that P are blocked, which can introduce significant latency variance. This modifies fractional workers to self-preempt shortly after achieving the fractional utilization goal. In practice this means they preempt much sooner, and the scale of their preemption is on the order of how often the user goroutine block (so, if the application is compute-bound, the fractional workers will also run for long times, but if the application blocks frequently, the fractional workers will also preempt quickly). Fixes #21698. Updates #18534. Change-Id: I03a5ab195dae93154a46c32083c4bb52415d2017 Reviewed-on: https://go-review.googlesource.com/68573 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>

Currently, both the background mark worker and the goal GC CPU are both fixed at 25%. The trigger controller's goal is to achieve the goal CPU usage, and with the previous commit it can actually achieve this. But this means there are *no* assists, which sounds ideal but actually causes problems for the trigger controller. Since the controller can't lower CPU usage below the background mark worker CPU, it saturates at the CPU goal and no longer gets feedback, which translates into higher variability in heap growth. This commit fixes this by allowing assists 5% CPU beyond the 25% fixed background mark. This avoids saturating the trigger controller, since it can now get feedback from both sides of the CPU goal. This leads to low variability in both CPU usage and heap growth, at the cost of reintroducing a low rate of mark assists. We also experimented with 20% background plus 5% assist, but 25%+5% clearly performed better in benchmarks. Updates #14951. Updates #14812. Updates #18534. Combined with the previous CL, this significantly improves tail mutator utilization in the x/bechmarks garbage benchmark. On a sample trace, it increased the 99.9%ile mutator utilization at 10ms from 26% to 59%, and at 5ms from 17% to 52%. It reduced the 99.9%ile zero utilization window from 2ms to 700µs. It also helps the mean mutator utilization: it increased the 10s mutator utilization from 83% to 94%. The minimum mutator utilization is also somewhat improved, though there is still some unknown artifact that causes a miniscule fraction of mutator assists to take 5--10ms (in fact, there was exactly one 10ms mutator assist in my sample trace). This has no significant effect on the throughput of the github.com/dr2chase/bent benchmarks-50. This has little effect on the go1 benchmarks (and the slight overall improvement makes up for the slight overall slowdown from the previous commit): name old time/op new time/op delta BinaryTree17-12 2.40s ± 0% 2.41s ± 1% +0.26% (p=0.010 n=18+19) Fannkuch11-12 2.95s ± 0% 2.93s ± 0% -0.62% (p=0.000 n=18+15) FmtFprintfEmpty-12 42.2ns ± 0% 42.3ns ± 1% +0.37% (p=0.001 n=15+14) FmtFprintfString-12 67.9ns ± 2% 67.2ns ± 3% -1.03% (p=0.002 n=20+18) FmtFprintfInt-12 75.6ns ± 3% 76.8ns ± 2% +1.59% (p=0.000 n=19+17) FmtFprintfIntInt-12 123ns ± 1% 124ns ± 1% +0.77% (p=0.000 n=17+14) FmtFprintfPrefixedInt-12 148ns ± 1% 150ns ± 1% +1.28% (p=0.000 n=20+20) FmtFprintfFloat-12 212ns ± 0% 211ns ± 1% -0.67% (p=0.000 n=16+17) FmtManyArgs-12 499ns ± 1% 500ns ± 0% +0.23% (p=0.004 n=19+16) GobDecode-12 6.49ms ± 1% 6.51ms ± 1% +0.32% (p=0.008 n=19+19) GobEncode-12 5.47ms ± 0% 5.43ms ± 1% -0.68% (p=0.000 n=19+20) Gzip-12 220ms ± 1% 216ms ± 1% -1.66% (p=0.000 n=20+19) Gunzip-12 38.8ms ± 0% 38.5ms ± 0% -0.80% (p=0.000 n=19+20) HTTPClientServer-12 78.5µs ± 1% 78.1µs ± 1% -0.53% (p=0.008 n=20+19) JSONEncode-12 12.2ms ± 0% 11.9ms ± 0% -2.38% (p=0.000 n=17+19) JSONDecode-12 52.3ms ± 0% 53.3ms ± 0% +1.84% (p=0.000 n=19+20) Mandelbrot200-12 3.69ms ± 0% 3.69ms ± 0% -0.19% (p=0.000 n=19+19) GoParse-12 3.17ms ± 1% 3.19ms ± 1% +0.61% (p=0.000 n=20+20) RegexpMatchEasy0_32-12 73.7ns ± 0% 73.2ns ± 1% -0.66% (p=0.000 n=17+20) RegexpMatchEasy0_1K-12 238ns ± 0% 239ns ± 0% +0.32% (p=0.000 n=17+16) RegexpMatchEasy1_32-12 69.1ns ± 1% 69.2ns ± 1% ~ (p=0.669 n=19+13) RegexpMatchEasy1_1K-12 365ns ± 1% 367ns ± 1% +0.49% (p=0.000 n=19+19) RegexpMatchMedium_32-12 104ns ± 1% 105ns ± 1% +1.33% (p=0.000 n=16+20) RegexpMatchMedium_1K-12 33.6µs ± 3% 34.1µs ± 4% +1.67% (p=0.001 n=20+20) RegexpMatchHard_32-12 1.67µs ± 1% 1.62µs ± 1% -2.78% (p=0.000 n=18+17) RegexpMatchHard_1K-12 50.3µs ± 2% 48.7µs ± 1% -3.09% (p=0.000 n=19+18) Revcomp-12 384ms ± 0% 386ms ± 0% +0.59% (p=0.000 n=19+19) Template-12 61.1ms ± 1% 60.5ms ± 1% -1.02% (p=0.000 n=19+20) TimeParse-12 307ns ± 0% 303ns ± 1% -1.23% (p=0.000 n=19+15) TimeFormat-12 323ns ± 0% 323ns ± 0% -0.12% (p=0.011 n=15+20) [Geo mean] 47.1µs 47.0µs -0.20% https://perf.golang.org/search?q=upload:20171030.4 It slightly improve the performance the x/benchmarks: name old time/op new time/op delta Garbage/benchmem-MB=1024-12 2.29ms ± 3% 2.22ms ± 2% -2.97% (p=0.000 n=18+18) Garbage/benchmem-MB=64-12 2.24ms ± 2% 2.21ms ± 2% -1.64% (p=0.000 n=18+18) HTTP-12 12.6µs ± 1% 12.6µs ± 1% ~ (p=0.690 n=19+17) JSON-12 11.3ms ± 2% 11.3ms ± 1% ~ (p=0.163 n=17+18) and fixes some of the heap size bloat caused by the previous commit: name old peak-RSS-bytes new peak-RSS-bytes delta Garbage/benchmem-MB=1024-12 1.88G ± 2% 1.77G ± 2% -5.52% (p=0.000 n=20+18) Garbage/benchmem-MB=64-12 248M ± 8% 226M ± 5% -8.93% (p=0.000 n=20+20) HTTP-12 47.0M ±27% 47.2M ±12% ~ (p=0.512 n=20+20) JSON-12 206M ±11% 206M ±10% ~ (p=0.841 n=20+20) https://perf.golang.org/search?q=upload:20171030.5 Combined with the change to add a soft goal in the previous commit, the achieves a decent performance improvement on the garbage benchmark: name old time/op new time/op delta Garbage/benchmem-MB=1024-12 2.40ms ± 4% 2.22ms ± 2% -7.40% (p=0.000 n=19+18) Garbage/benchmem-MB=64-12 2.23ms ± 1% 2.21ms ± 2% -1.06% (p=0.000 n=19+18) HTTP-12 12.5µs ± 1% 12.6µs ± 1% ~ (p=0.330 n=20+17) JSON-12 11.1ms ± 1% 11.3ms ± 1% +1.87% (p=0.000 n=16+18) https://perf.golang.org/search?q=upload:20171030.6 Change-Id: If04ddb57e1e58ef2fb9eec54c290eb4ae4bea121 Reviewed-on: https://go-review.googlesource.com/59971 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>

aclements self-assigned this Jan 6, 2017

aclements added this to the Go1.9 milestone Jan 6, 2017

aclements assigned RLH Jan 6, 2017

aclements added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 6, 2017

aclements modified the milestones: Go1.9Maybe, Go1.9 Jul 18, 2017

aclements modified the milestones: Go1.10, Go1.9Maybe Jul 18, 2017

aclements added the early-in-cycle A change that should be done early in the 3 month dev cycle. label Jul 20, 2017

aclements mentioned this issue Aug 30, 2017

runtime: fractional workers can cause 20ms mutator latency #21698

Closed

gopherbot closed this as completed in 03eb948 Oct 31, 2017

golang locked and limited conversation to collaborators Oct 31, 2018

gopherbot added the FrozenDueToAge label Oct 31, 2018

rsc unassigned RLH Jun 23, 2022

rsc unassigned aclements Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: GC causes latency spikes with single processor #18534

runtime: GC causes latency spikes with single processor #18534

tarm commented Jan 6, 2017 •

edited

tarm commented Jan 31, 2017

quentinmit commented Jan 31, 2017

minux commented Jan 31, 2017 via email

tarm commented Mar 18, 2017 •

edited

tarm commented May 19, 2017

RLH commented May 19, 2017 via email

aclements commented Jul 18, 2017

gopherbot commented Aug 29, 2017

gopherbot commented Aug 29, 2017

tarm commented Aug 30, 2017

aclements commented Aug 30, 2017

tarm commented Aug 31, 2017

gopherbot commented Aug 31, 2017

gopherbot commented Oct 5, 2017

runtime: GC causes latency spikes with single processor #18534

runtime: GC causes latency spikes with single processor #18534

Comments

tarm commented Jan 6, 2017 • edited

What version of Go are you using (go version)?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

tarm commented Jan 31, 2017

quentinmit commented Jan 31, 2017

minux commented Jan 31, 2017 via email

tarm commented Mar 18, 2017 • edited

tarm commented May 19, 2017

RLH commented May 19, 2017 via email

aclements commented Jul 18, 2017

gopherbot commented Aug 29, 2017

gopherbot commented Aug 29, 2017

tarm commented Aug 30, 2017

aclements commented Aug 30, 2017

tarm commented Aug 31, 2017

gopherbot commented Aug 31, 2017

gopherbot commented Oct 5, 2017

tarm commented Jan 6, 2017 •

edited

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?

tarm commented Mar 18, 2017 •

edited