Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: clearpools causes excessive STW1 time #22331

Closed
bryanpkc opened this issue Oct 18, 2017 · 5 comments
Closed

runtime: clearpools causes excessive STW1 time #22331

bryanpkc opened this issue Oct 18, 2017 · 5 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@bryanpkc
Copy link
Contributor

Our application uses a large number of goroutines, mutexes, and defers. Using go 1.8.3 on linux amd64, we observed that the STW1 time is dominated by the runtime.clearpools function in mgc.go. Here is an excerpt from the GC trace:

gc 12 @27.641s 4%: 16+865+0.82 ms clock, 522+6169/9551/1514+26 ms cpu, 8375->8694->5362 MB, 8943 MB goal, 48 P

By instrumenting the runtime, we determined that, out of the 16ms STW1 time, 9.3ms was spent in the loop that disconnects sched.sudogcache linked list, and 6.7ms was spent in the loop that disconnects the linked lists in sched.deferpool. These O(n) loops end up causing a very long pause.

I understand the reason for originally introducing the loops in #9110. However, looking at tip around the call sites of releaseSudog and freedefer, I couldn't see a case where a released object would still be referenced by a live pointer somewhere else in the system. Are these loops still really necessary?

FWIW, I replaced the loops with a simple zeroing of the heads of the linked lists, and test/fixedbugs/issue9110.go still passed.

I propose making these loops optional to avoid the excessive STW1 time, and enabling them only with a GODEBUG option for debugging purposes.

@bradfitz
Copy link
Contributor

/cc @aclements

@bradfitz bradfitz changed the title runtime.clearpools causes excessive STW1 time runtime: clearpools causes excessive STW1 time Oct 18, 2017
@bradfitz bradfitz added this to the Go1.10 milestone Oct 18, 2017
@ianlancetaylor ianlancetaylor added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 18, 2017
@rsc rsc modified the milestones: Go1.10, Go1.11 Nov 22, 2017
@ianlancetaylor ianlancetaylor modified the milestones: Go1.11, Go1.12 Jul 9, 2018
@aclements aclements modified the milestones: Go1.12, Go1.13 Jan 8, 2019
@gopherbot
Copy link

Change https://golang.org/cl/166957 mentions this issue: sync: internal fixed size lock-free queue for sync.Pool

@gopherbot
Copy link

Change https://golang.org/cl/166958 mentions this issue: sync: internal dynamically sized lock-free queue for sync.Pool

@gopherbot
Copy link

Change https://golang.org/cl/166959 mentions this issue: sync: add Pool benchmarks to stress STW and reuse

@gopherbot
Copy link

Change https://golang.org/cl/166960 mentions this issue: sync: use lock-free structure for Pool stealing

gopherbot pushed a commit that referenced this issue Apr 5, 2019
This is the first step toward fixing multiple issues with sync.Pool.
This adds a fixed size, lock-free, single-producer, multi-consumer
queue that will be used in the new Pool stealing implementation.

For #22950, #22331.

Change-Id: I50e85e3cb83a2ee71f611ada88e7f55996504bb5
Reviewed-on: https://go-review.googlesource.com/c/go/+/166957
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
gopherbot pushed a commit that referenced this issue Apr 5, 2019
This adds a dynamically sized, lock-free, single-producer,
multi-consumer queue that will be used in the new Pool stealing
implementation. It's built on top of the fixed-size queue added in the
previous CL.

For #22950, #22331.

Change-Id: Ifc0ca3895bec7e7f9289ba9fb7dd0332bf96ba5a
Reviewed-on: https://go-review.googlesource.com/c/go/+/166958
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
gopherbot pushed a commit that referenced this issue Apr 5, 2019
This adds two benchmarks that will highlight two problems in Pool that
we're about to address.

The first benchmark measures the impact of large Pools on GC STW time.
Currently, STW time is O(# of items in Pools), and this benchmark
demonstrates 70µs STW times.

The second benchmark measures the impact of fully clearing all Pools
on each GC. Typically this is a problem in heavily-loaded systems
because it causes a spike in allocation. This benchmark stresses this
by simulating an expensive "New" function, so the cost of creating new
objects is reflected in the ns/op of the benchmark.

For #22950, #22331.

Change-Id: I0c8853190d23144026fa11837b6bf42adc461722
Reviewed-on: https://go-review.googlesource.com/c/go/+/166959
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
@golang golang locked and limited conversation to collaborators Apr 4, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

6 participants