Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: GC turning on/off disruptive to scheduler and locality #20307

Open
josharian opened this issue May 10, 2017 · 10 comments
Open

runtime: GC turning on/off disruptive to scheduler and locality #20307

josharian opened this issue May 10, 2017 · 10 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance ToolSpeed
Milestone

Comments

@josharian
Copy link
Contributor

I generated a trace of the concurrent compiler backend, and see some unfortunate things in it.

$ go tool compile -traceprofile=t.p -c=8 fmt/{doc,format,print,scan}.go
$ go tool trace t.p

Screenshot:

compile_fmt_trace_zoomed

On the left hand side of this screenshot is the beginning of concurrent backend compilation. It continues off-screen to the right. The disconcerting things I see are:

  • Large chunks of idle time. For example, 5 ms of idle time in proc 7 with plenty of work available to be scheduled, right when GC turns off, around 73ms.
  • goroutines being shuffled around between procs when GC flips on/off (the clear visual vertical lines), which is bad for data locality and caches. Should the STW phase note where Gs are and attempt to put them back there when it is done? Or some such?

For reference, the concurrency (fan-out, etc.) is all contained in a single function, compileFunctions, in pgen.go: https://github.com/golang/go/blob/master/src/cmd/compile/internal/gc/pgen.go#L255

Apologies if this is a duplicate. I can split this into two issues if that would be helpful.

cc @aclements @RLH

@josharian
Copy link
Contributor Author

Ah. I can fix the idle time by used a buffered channel. (I shouldn't have to, but it works.) The G/proc shuffling remains. New trace:

with_buffered_chan

@gopherbot
Copy link

CL https://golang.org/cl/43110 mentions this issue.

josharian added a commit to josharian/go that referenced this issue May 10, 2017
Updates golang#20307

With -c=2:

name        old time/op       new time/op       delta
Template          140ms ± 3%        139ms ± 4%  -1.06%  (p=0.003 n=50+50)
Unicode          81.1ms ± 4%       81.9ms ± 4%  +0.96%  (p=0.006 n=50+49)
GoTypes           375ms ± 3%        374ms ± 3%    ~     (p=0.094 n=48+48)
Compiler          1.69s ± 2%        1.68s ± 2%  -0.41%  (p=0.004 n=49+48)
SSA               3.05s ± 1%        3.05s ± 2%    ~     (p=0.953 n=47+49)
Flate            86.3ms ± 2%       85.9ms ± 2%  -0.49%  (p=0.011 n=49+48)
GoParser         99.5ms ± 3%       99.3ms ± 3%    ~     (p=0.394 n=48+49)
Reflect           262ms ± 3%        261ms ± 3%    ~     (p=0.354 n=47+49)
Tar              81.4ms ± 3%       79.7ms ± 4%  -1.98%  (p=0.000 n=47+50)
XML               133ms ± 3%        133ms ± 3%    ~     (p=0.992 n=50+49)
[Geo mean]        236ms             235ms       -0.36%

name        old user-time/op  new user-time/op  delta
Template          249ms ± 5%        242ms ± 7%  -2.61%  (p=0.000 n=48+50)
Unicode           111ms ± 4%        111ms ± 6%    ~     (p=0.407 n=46+47)
GoTypes           753ms ± 2%        748ms ± 3%  -0.65%  (p=0.010 n=48+50)
Compiler          3.28s ± 2%        3.27s ± 2%  -0.40%  (p=0.026 n=49+47)
SSA               7.03s ± 2%        7.01s ± 3%    ~     (p=0.154 n=45+50)
Flate             154ms ± 3%        154ms ± 3%    ~     (p=0.306 n=49+49)
GoParser          180ms ± 4%        179ms ± 4%    ~     (p=0.148 n=48+48)
Reflect           427ms ± 2%        428ms ± 3%    ~     (p=0.502 n=46+49)
Tar               142ms ± 5%        135ms ± 9%  -4.83%  (p=0.000 n=46+50)
XML               247ms ± 3%        247ms ± 4%    ~     (p=0.921 n=49+49)
[Geo mean]        426ms             422ms       -0.92%


Change-Id: I4746234439ddb9a7e5840fc783b8857da6a4a680
gopherbot pushed a commit that referenced this issue May 10, 2017
Updates #20307

With -c=2:

name        old time/op       new time/op       delta
Template          140ms ± 3%        139ms ± 4%  -1.06%  (p=0.003 n=50+50)
Unicode          81.1ms ± 4%       81.9ms ± 4%  +0.96%  (p=0.006 n=50+49)
GoTypes           375ms ± 3%        374ms ± 3%    ~     (p=0.094 n=48+48)
Compiler          1.69s ± 2%        1.68s ± 2%  -0.41%  (p=0.004 n=49+48)
SSA               3.05s ± 1%        3.05s ± 2%    ~     (p=0.953 n=47+49)
Flate            86.3ms ± 2%       85.9ms ± 2%  -0.49%  (p=0.011 n=49+48)
GoParser         99.5ms ± 3%       99.3ms ± 3%    ~     (p=0.394 n=48+49)
Reflect           262ms ± 3%        261ms ± 3%    ~     (p=0.354 n=47+49)
Tar              81.4ms ± 3%       79.7ms ± 4%  -1.98%  (p=0.000 n=47+50)
XML               133ms ± 3%        133ms ± 3%    ~     (p=0.992 n=50+49)
[Geo mean]        236ms             235ms       -0.36%

name        old user-time/op  new user-time/op  delta
Template          249ms ± 5%        242ms ± 7%  -2.61%  (p=0.000 n=48+50)
Unicode           111ms ± 4%        111ms ± 6%    ~     (p=0.407 n=46+47)
GoTypes           753ms ± 2%        748ms ± 3%  -0.65%  (p=0.010 n=48+50)
Compiler          3.28s ± 2%        3.27s ± 2%  -0.40%  (p=0.026 n=49+47)
SSA               7.03s ± 2%        7.01s ± 3%    ~     (p=0.154 n=45+50)
Flate             154ms ± 3%        154ms ± 3%    ~     (p=0.306 n=49+49)
GoParser          180ms ± 4%        179ms ± 4%    ~     (p=0.148 n=48+48)
Reflect           427ms ± 2%        428ms ± 3%    ~     (p=0.502 n=46+49)
Tar               142ms ± 5%        135ms ± 9%  -4.83%  (p=0.000 n=46+50)
XML               247ms ± 3%        247ms ± 4%    ~     (p=0.921 n=49+49)
[Geo mean]        426ms             422ms       -0.92%


Change-Id: I4746234439ddb9a7e5840fc783b8857da6a4a680
Reviewed-on: https://go-review.googlesource.com/43110
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
@RLH
Copy link
Contributor

RLH commented May 10, 2017 via email

@josharian
Copy link
Contributor Author

It's running on my laptop (8 core 2.9 GHz Intel Core i7 macOS). Although presumably this happens on all hardware, not just mine.

If it's easy to hack together a patch (unsightly or not) that experiments with memorizing G locations, I'm happy to benchmark it. The concurrent compiler seems like a good test case for it--numcpu long-running memory-heavy calculations.

That said the GC tracing is likely to do far more to smash the cache and TLB than moving goroutines between cores on the same chip.

Perhaps so. Seems worth an experiment, if feasible.

@griesemer
Copy link
Contributor

griesemer commented May 10, 2017

@josharian Do you know why the buffered channel makes such a big difference? Is it because there's a lot of time wasted (for reasons not yet understood) when context-switching (unbuffered case), and there's less context switching needed when there's a buffered channel?

@josharian
Copy link
Contributor Author

@griesemer I think that's exactly it--those gaps in the first trace are (I suspect) caused by waiting for the scheduler to coordinate handing over work to a new worker, which requires scheduling both the producer and the consumer. The unbuffered channel means that (to a first approximation) there's always work available for a worker without needing the scheduler, thus no gaps.

@RLH
Copy link
Contributor

RLH commented May 10, 2017 via email

@josharian
Copy link
Contributor Author

If I'm reading the goroutine line in the trace correctly there are runnable goroutines that aren't being run during these gaps.

That's correct.

As far as adding CPU affinity to our goroutines I'm a bit nervous.

I'm not suggesting CPU affinity. (At least, I don't think I am.) I'm suggesting that when we go into STW, we record which Ps are running which Gs, and put those Gs back on those Ps when we leave STW. That's it. But maybe that does involve breaking many layers of abstraction; if so, I understand your reluctance.

@aclements
Copy link
Member

I'm suggesting that when we go into STW, we record which Ps are running which Gs, and put those Gs back on those Ps when we leave STW.

We do that already, actually. STW leaves the per-P run queues in place. However, it does kick the currently running G to the end of the P's run queue (because it's using the regular preemption path), which is probably why it doesn't look like we have affinity in the execution traces.

That might be relatively easy to hack in a fix for, but it's not going to matter much because we definitely don't maintain P-to-M affinity over STW, so the goroutine will almost certainly be resumed on a different OS thread even if it's still on the same P. That's harder to fix.

@ianlancetaylor ianlancetaylor added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Apr 13, 2018
@ianlancetaylor ianlancetaylor added this to the Unplanned milestone Apr 13, 2018
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 7, 2022
@LeGamerDc
Copy link

yes, gc period goroutine shuffle mix with numa arch will make things event worse

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance ToolSpeed
Projects
None yet
Development

No branches or pull requests

7 participants