-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: scavenger pacing fails to account for fragmentation [1.13 backport] #34149
Comments
@mknyszek, @aclements: could you describe the conditions under which this performance degradation occurs? (Is it predictable? Does it show up during testing? Is there a workaround available? Do we have a rough estimate for the fraction of users affected?) |
Yes. If the application's span fragmentation (heap_inuse / heap_alloc) >= 10%, the scavenger can spiral into a state in which every free span is scavenged, leading to a large number of page faults and a large number of syscalls.
I don't have a specific test, but I can recreate the case with both large and small programs. I'd have to write a short program which reproduces the problem, but that shouldn't be difficult to do.
No. There is no way to work around this because it's embedded in runtime pacing. You'd have to somehow keep your application's span fragmentation under 10% which isn't reasonable to ask of anyone.
No, and I'm not sure exactly how we'd be able to get that. We know for a fact that it severely affects Kubernetes' API latency and is one reason why they're blocked on moving to Go 1.13 (see #32828). There is another (medium-sized, I suppose?) Google application which took consistently took 80% longer to run in Go 1.13 as a direct consequence of this issue. The patch in the non-backport bug fixes this behavior, and brings it back to Go 1.12 levels of performance. |
By “Is it predictable?” and “Does it show up during testing?”, I meant more, “will users be able to predict and/or identify that they are affected?” (That is: this seems more important to backport if it can crop up suddenly after a user has already tested and deployed a new build.) |
Ah, I think I understand. Then it's not predictable. It can certainly show up suddenly in production, without anyone noticing while running tests, running microbenchmarks, or even a smaller version of the full application. It's as difficult to predict as how much heap fragmentation your application produces, which may be a function of input in some cases. |
I agree that this should be backported for the reasons @mknyszek laid out. Specifically, the CL to backport is https://golang.org/cl/193040. |
Change https://golang.org/cl/198487 mentions this issue: |
…eap_inuse This change makes it so that the scavenge goal is defined primarily in terms of heap_inuse at the end of the last GC rather than next_gc. The reason behind this change is that next_gc doesn't take into account fragmentation, and we can fall into situation where the scavenger thinks it should have work to do but there's no free and unscavenged memory available. In order to ensure the scavenge goal still tracks next_gc, we multiply heap_inuse by the ratio between the current heap goal and the last heap goal, which describes whether the heap is growing or shrinking, and by how much. Finally, this change updates the documentation for scavenging and elaborates on why the scavenge goal is defined the way it is. Fixes #34149 Change-Id: I8deaf87620b5dc12a40ab8a90bf27932868610da Reviewed-on: https://go-review.googlesource.com/c/go/+/193040 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Keith Randall <khr@golang.org> (cherry picked from commit 9b30811) Reviewed-on: https://go-review.googlesource.com/c/go/+/198487 Run-TryBot: Andrew Bonventre <andybons@golang.org>
Closed by merging cd951ae to release-branch.go1.13. |
@mknyszek requested issue #34048 to be considered for backport to the next 1.13 minor release.
The text was updated successfully, but these errors were encountered: