runtime: scavenger pacing fails to account for fragmentation [1.13 backport] #34149

gopherbot · 2019-09-06T18:00:05Z

@mknyszek requested issue #34048 to be considered for backport to the next 1.13 minor release.

Since this has the potential to cause a severe performance loss, I think we should backport this to Go 1.13.

@gopherbot Please open a backport issue for 1.13.

bcmills · 2019-09-25T16:17:42Z

this has the potential to cause a severe performance loss

@mknyszek, @aclements: could you describe the conditions under which this performance degradation occurs? (Is it predictable? Does it show up during testing? Is there a workaround available? Do we have a rough estimate for the fraction of users affected?)

mknyszek · 2019-09-25T16:46:31Z

Is it predictable?

Yes. If the application's span fragmentation (heap_inuse / heap_alloc) >= 10%, the scavenger can spiral into a state in which every free span is scavenged, leading to a large number of page faults and a large number of syscalls.

Does it show up during testing?

I don't have a specific test, but I can recreate the case with both large and small programs. I'd have to write a short program which reproduces the problem, but that shouldn't be difficult to do.

Is there a workaround available?

No. There is no way to work around this because it's embedded in runtime pacing. You'd have to somehow keep your application's span fragmentation under 10% which isn't reasonable to ask of anyone.

Do we have a rough estimate for the fraction of users affected?

No, and I'm not sure exactly how we'd be able to get that. We know for a fact that it severely affects Kubernetes' API latency and is one reason why they're blocked on moving to Go 1.13 (see #32828). There is another (medium-sized, I suppose?) Google application which took consistently took 80% longer to run in Go 1.13 as a direct consequence of this issue. The patch in the non-backport bug fixes this behavior, and brings it back to Go 1.12 levels of performance.

bcmills · 2019-09-25T18:58:15Z

By “Is it predictable?” and “Does it show up during testing?”, I meant more, “will users be able to predict and/or identify that they are affected?”

(That is: this seems more important to backport if it can crop up suddenly after a user has already tested and deployed a new build.)

mknyszek · 2019-09-25T22:32:18Z

Ah, I think I understand. Then it's not predictable. It can certainly show up suddenly in production, without anyone noticing while running tests, running microbenchmarks, or even a smaller version of the full application. It's as difficult to predict as how much heap fragmentation your application produces, which may be a function of input in some cases.

aclements · 2019-10-02T18:10:35Z

I agree that this should be backported for the reasons @mknyszek laid out.

Specifically, the CL to backport is https://golang.org/cl/193040.

gopherbot · 2019-10-02T19:53:26Z

Change https://golang.org/cl/198487 mentions this issue: [release-branch.go1.13] runtime: redefine scavenge goal in terms of heap_inuse

…eap_inuse This change makes it so that the scavenge goal is defined primarily in terms of heap_inuse at the end of the last GC rather than next_gc. The reason behind this change is that next_gc doesn't take into account fragmentation, and we can fall into situation where the scavenger thinks it should have work to do but there's no free and unscavenged memory available. In order to ensure the scavenge goal still tracks next_gc, we multiply heap_inuse by the ratio between the current heap goal and the last heap goal, which describes whether the heap is growing or shrinking, and by how much. Finally, this change updates the documentation for scavenging and elaborates on why the scavenge goal is defined the way it is. Fixes #34149 Change-Id: I8deaf87620b5dc12a40ab8a90bf27932868610da Reviewed-on: https://go-review.googlesource.com/c/go/+/193040 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Keith Randall <khr@golang.org> (cherry picked from commit 9b30811) Reviewed-on: https://go-review.googlesource.com/c/go/+/198487 Run-TryBot: Andrew Bonventre <andybons@golang.org>

gopherbot · 2019-10-04T16:56:19Z

Closed by merging cd951ae to release-branch.go1.13.

gopherbot added the CherryPickCandidate label Sep 6, 2019

gopherbot mentioned this issue Sep 6, 2019

runtime: scavenger pacing fails to account for fragmentation #34048

Closed

gopherbot added this to the Go1.13.1 milestone Sep 6, 2019

mknyszek self-assigned this Sep 6, 2019

bcmills modified the milestones: Go1.13.1, Go1.13.2 Sep 25, 2019

bcmills added the Performance label Sep 25, 2019

This was referenced Sep 26, 2019

runtime: high-percentile latency of memory allocations has regressed significantly [1.13 backport] #34556

Closed

runtime: 1.13 performance regression on kubernetes scaling #32828

Closed

andybons added CherryPickApproved and removed CherryPickCandidate labels Oct 2, 2019

gopherbot closed this as completed Oct 4, 2019

katiehockman modified the milestones: Go1.13.2, Go1.13.3 Oct 17, 2019

mm4tt mentioned this issue Oct 24, 2019

update to use go1.13.4 kubernetes/kubernetes#82809

Merged

9 tasks

golang locked and limited conversation to collaborators Oct 16, 2020

gopherbot added the FrozenDueToAge label Oct 16, 2020

rsc unassigned mknyszek Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: scavenger pacing fails to account for fragmentation [1.13 backport] #34149

runtime: scavenger pacing fails to account for fragmentation [1.13 backport] #34149

gopherbot commented Sep 6, 2019

bcmills commented Sep 25, 2019

mknyszek commented Sep 25, 2019 •

edited

Loading

bcmills commented Sep 25, 2019

mknyszek commented Sep 25, 2019

aclements commented Oct 2, 2019

gopherbot commented Oct 2, 2019

gopherbot commented Oct 4, 2019

runtime: scavenger pacing fails to account for fragmentation [1.13 backport] #34149

runtime: scavenger pacing fails to account for fragmentation [1.13 backport] #34149

Comments

gopherbot commented Sep 6, 2019

bcmills commented Sep 25, 2019

mknyszek commented Sep 25, 2019 • edited Loading

bcmills commented Sep 25, 2019

mknyszek commented Sep 25, 2019

aclements commented Oct 2, 2019

gopherbot commented Oct 2, 2019

gopherbot commented Oct 4, 2019

mknyszek commented Sep 25, 2019 •

edited

Loading