Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: ~5% regression in ParallelizeUntil/pieces:1000,workers:10,chunkSize:1-8 sec/op at 450ecbe #64455

Open
rhysh opened this issue Nov 29, 2023 · 2 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@rhysh
Copy link
Contributor

rhysh commented Nov 29, 2023

From the performance dashboard, https://perf.golang.org/dashboard/?benchmark=regressions, I found a performance regression on a microbenchmark of contended channel operations (which involve a contended runtime.mutex), https://perf.golang.org/dashboard/?benchmark=ParallelizeUntil/pieces:1000,workers:10,chunkSize:1-8&unit=sec/op .

https://github.com/kubernetes/client-go/blob/v0.22.2/util/workqueue/parallelizer_test.go
https://github.com/kubernetes/client-go/blob/v0.22.2/util/workqueue/parallelizer.go

The benchmark runs on GOOS=linux, GOARCH=amd64 with GOMAXPROCS=8. It creates 10 workers which consume 1000 trivial work units from a shared channel. I've reproduced the regression on linux/amd64 and confirmed that it's introduced by https://go.dev/cl/544195 (and its predecessor, https://go.dev/cl/528657).

The magnitude of the regression is a ~5% slowdown on contended channel operations. On the hardware I used to test, they move from 174ns each for 10 workers on 8 threads (174µs for each b.N iteration, representing 1000 items) to 184ns.

CPU profiles show that extra 10ns goes to:

  • nanotime (to mark the start and end of contended lock acquisitions, sampled at gTrackingPeriod)
  • function call overhead
  • the bookkeeping that's now part of all runtime.unlock2 calls regardless of contention (to see if the M is storing contention info that needs to be moved to the mprof map)

Note that this code is active even with the default of GODEBUG=profileruntimelocks=0, because https://go.dev/cl/544195 reports the magnitude of runtime-internal lock contention as part of the /sync/mutex/wait/total:seconds metric.

I've been able to reduce the overhead through sound changes (reducing the sample rate, structuring the function calls to convince the compiler to inline the fast paths into runtime.lock2 and runtime.unlock2). I've been able to eliminate the overhead through an additional unsound change (to have runtime.unlock2 only interact with the lock profile when it's waking another thread). That change isn't sound because promoting data into the mprof structure is only safe when we're unlocking the M's last lock (which might not be one for which another thread is waiting), but it's similar to the Go 1.23 plans that @prattmic and I discussed, wherein we'd capture the stack of the unlocking thread and give it weight equivalent to how long it had delayed the other threads.

I'm not sure how much of a problem 10ns in this microbenchmark is in the big picture. But the above is what I learned when looking into it.

CC @prattmic @mknyszek
CC @golang/runtime

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Nov 29, 2023
@prattmic
Copy link
Member

Thanks for taking a look. Whichever improvements we do, right now I'm of the opinion that a 5% regression in heavily-contended microbenchmark isn't worth fixing in the freeze and can wait until the next release. My opinion may change if there are real-world applications affected.

@dmitshur dmitshur added this to the Backlog milestone Nov 30, 2023
@dmitshur dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 30, 2023
@mknyszek
Copy link
Contributor

mknyszek commented Dec 6, 2023

In triage, we agree with @prattmic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
Development

No branches or pull requests

5 participants