runtime: system experienced long pauses during GC marking #27410

fmstephe · 2018-08-31T09:32:43Z

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (`go version`)?

go version go1.10.3 linux/amd64

Does this issue reproduce with the latest release?

We don't know. Unfortunately we can't run this exact experiment again.

What operating system and processor architecture are you using (`go env`)?

linux/amd64

What did you do?

We were running a tool migrating data between two database systems. It maintains two in memory caches (implemented as sharded-maps) which became very large. As the heap grew we experienced very uneven performance.

We ran a series of experiments capping the size of the caches to compare the system operation at different cache sizes.

We took traces of the running system to compare the behaviour and see if we could get a better understanding of what was causing the performance degradation.

Unfortunately we had to force the garbage collector to run during the trace to ensure that we would actually see it. So the traces contain an artificial GC cycle, which won't work exactly the same as a naturally occurring one.

What did you expect to see?

What did you see instead?

Repeated and dramatically long gaps during the mark phase, on the order of 10-20 milliseconds. During these periods no network events occurred, effectively pausing the system.

We have written up an account of what we saw here

https://docs.google.com/document/d/1-DO8jp9q0ONuo9nL1EXVm9unzMHLA0KUkrisAWxjaRo/edit?usp=sharing

My company is reluctant to attach traces directly here, as they contain private code stack traces. But we are happy to send them directly to individuals if they would like to see them.

The text was updated successfully, but these errors were encountered:

josharian · 2018-08-31T23:46:11Z

@aclements @RLH

ysmolski · 2018-09-04T16:00:27Z

Mentioned google document cannot be accessed.

What was the approx. size of allocated heap?

fmstephe · 2018-09-04T22:14:30Z

@ysmolsky We experienced similar sized pauses during the mark phase. We tested heaps of size

8-13 GB
20-30 GB
60-70 GB

We saw a lot of 5-20ms pauses (where pauses are measured as a period where no network events are recorded in the trace). The larger heaps sizes saw a greater performance impact, but it looks like that was simply because the mark phase ran for much longer.

In particular we saw huge number of Idle-GC slices in the trace. If were to guess I would say the scheduler was struggling to schedule all of the Idle-GC goroutines as well and schedule meaningful work.

If you request access to the google doc, I will grant it as soon as I can.
I need to work out how to publish that without using google docs, as it makes reading the doc really hard.

agnivade · 2018-09-19T09:02:22Z

Likely a dupe of #27732. @dr2chase ?

fmstephe · 2018-09-19T10:49:35Z

We saw no Mark assists in the trace. I specifically looked for them, because that was my first thought.

With the way we forced the GC to run during the trace makes assists even less likely. Because an early GC run is less likely to feel like the mutators are outpacing it and ask for assists (is my understanding).

The unusual feature of this test that the system was running on a machine with 48 cores with GOMAXPROCS unset. It looks like it was trying to use all the cores during GC, but most GC slices are idle.

fmstephe · 2018-10-05T20:16:18Z

The blog post has been published

http://big-elephants.com/2018-09/unexpected-gc-pauses/

If anything written there is incorrect I am happy to make edits.

Should be easier to access than the google doc.

dr2chase · 2020-01-13T03:01:23Z

Crud, I just now saw this.
Is this still a problem?
And how many processors are there, and is the Go process the only significant load on the box?
(I.e., is it possible that there is competition with other processes?)

fmstephe · 2020-01-17T15:28:53Z

The particular program where we saw this behaviour was a one off. We ran it for about a week it was never run again. It more or less had the whole machine to itself.

This experience was somewhat accidental, in that the service tested here was deployed without a GOMAXPROCS so it naturally just used all the cores available. So we don't experience this generally today, or we are not aware of experiencing it.

If we see any behaviour like this again I'll be sure to post new information.

From my perspective I think that this issue should probably be closed, just out of pragmatism. The version of Go is out of date and there's no practical way for us to reproduce this today.

I am personally very interested about this behaviour so we may try to reproduce it in a different system, but if we do then I can post a new issue.

fmstephe · 2020-01-18T12:13:43Z

Oh, something I forgot to mention above. I still have all the traces that were used to make that blog-post. If someone wants to dig into those I am very happy to send them.

fmstephe · 2020-01-18T19:29:52Z

@dr2chase after re-reading your comment I realised I didn't answer one of your questions.

The system was running with 42 cores available.

gopherbot · 2023-01-26T13:15:05Z

Timed out in state WaitingForInfo. Closing.

(I am just a bot, though. Please speak up if this is a mistake or you have the requested information.)

ALTree changed the title ~~System experienced long pauses during GC marking~~ runtime: system experienced long pauses during GC marking Aug 31, 2018

ALTree added NeedsInvestigation GarbageCollector labels Aug 31, 2018

ALTree added this to the Go1.12 milestone Aug 31, 2018

aclements modified the milestones: Go1.12, Go1.13 Jan 8, 2019

andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019

larytet mentioned this issue Aug 9, 2019

runtime: GC causes latency spikes #14812

Closed

rsc modified the milestones: Go1.14, Backlog Oct 9, 2019

agnivade added the WaitingForInfo label Jan 13, 2020

gopherbot added the compiler/runtime label Jul 7, 2022

mknyszek added this to Go Compiler / Runtime Jul 7, 2022

mknyszek moved this to Triage Backlog in Go Compiler / Runtime Jul 15, 2022

seankhliao added WaitingForInfo and removed WaitingForInfo labels Dec 26, 2022

gopherbot closed this as completed Jan 26, 2023

github-project-automation bot moved this from Triage Backlog to Done in Go Compiler / Runtime Jan 26, 2023

mknyszek removed this from Go Compiler / Runtime Feb 15, 2023

golang locked and limited conversation to collaborators Jan 26, 2024

gopherbot added the FrozenDueToAge label Jan 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: system experienced long pauses during GC marking #27410

runtime: system experienced long pauses during GC marking #27410

fmstephe commented Aug 31, 2018 •

edited

Loading

josharian commented Aug 31, 2018

ysmolski commented Sep 4, 2018

fmstephe commented Sep 4, 2018 •

edited

Loading

agnivade commented Sep 19, 2018

fmstephe commented Sep 19, 2018

fmstephe commented Oct 5, 2018

dr2chase commented Jan 13, 2020

fmstephe commented Jan 17, 2020

fmstephe commented Jan 18, 2020

fmstephe commented Jan 18, 2020

gopherbot commented Jan 26, 2023

runtime: system experienced long pauses during GC marking #27410

runtime: system experienced long pauses during GC marking #27410

Comments

fmstephe commented Aug 31, 2018 • edited Loading

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

josharian commented Aug 31, 2018

ysmolski commented Sep 4, 2018

fmstephe commented Sep 4, 2018 • edited Loading

agnivade commented Sep 19, 2018

fmstephe commented Sep 19, 2018

fmstephe commented Oct 5, 2018

dr2chase commented Jan 13, 2020

fmstephe commented Jan 17, 2020

fmstephe commented Jan 18, 2020

fmstephe commented Jan 18, 2020

gopherbot commented Jan 26, 2023

fmstephe commented Aug 31, 2018 •

edited

Loading

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?

fmstephe commented Sep 4, 2018 •

edited

Loading