Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: stronger affinity between G ↔ P ↔ M ↔ CPU? #65694

Open
prattmic opened this issue Feb 13, 2024 · 2 comments
Open

runtime: stronger affinity between G ↔ P ↔ M ↔ CPU? #65694

prattmic opened this issue Feb 13, 2024 · 2 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@prattmic
Copy link
Member

Currently the runtime makes some attempts to maintain affinity between these resources:

  • When Gs are descheduled, they are usually placed on the local P run queue, to run again on the same P later (if they aren't stolen).
  • Ps remain on the same M until they stop.
  • Ms don't do anything to explicitly attempt CPU affinity, but OS schedulers generally do this.

There are cases where we explicitly do not maintain affinity:

  • Gs are occasionally placed on the global run queue, making them more likely to move to different P (this is intentional, but perhaps not ideal).
  • Stopped Ps have no affinity to an M. This is most notable across STW. STW stops all Ps. When the world starts again, the Ps all move to arbitrary Ms.
  • GC mark workers are started during a STW, and schedule with high priority, making them likely to run on a P that was previously running a G, even if there are idle Ps. This forces the G to move to a different P.

The lack of perfect affinity is typically readily evident when viewing an execution trace, where you can see Gs moving around, even when there are idle resources. It is especially evident across a STW, but movement can be seen during normal execution as well.

In this example below, we can see G11 through G19 all moving between threads several times (thanks @aktau for raising this).
image

These migrations are certainly a (minor) annoyance when viewing traces.

They may also be a source of performance degradation. For example, CPU caches are likely empty after a migration, causing additional cache misses. Perhaps it could even have NUMA effects if a Gs allocations came from a P's mcache with spans that the OS has placed on one NUMA node, and then moving to a different M/CPU makes memory access slower.

None of these potential performance effects have been measured to determine if they are noticeable. e.g., migration will clearly have cache effects, but migration tends to occur in 10ms intervals, or much longer. It isn't clear that cache effects would be noticeable at these long time scales. More research is required.

cc @mknyszek @aclements @aktau

@prattmic prattmic added Performance compiler/runtime Issues related to the Go compiler and/or runtime. labels Feb 13, 2024
@prattmic prattmic added this to the Backlog milestone Feb 13, 2024
@Jorropo
Copy link
Member

Jorropo commented Feb 13, 2024

Should we expect CPU caches to still have useful caches after STW ?
My naive guess is that STW combined with GC scan would trash all the cached data with somewhat random data (/ whatever the GC happens to scan last-ish).

@prattmic
Copy link
Member Author

Should we expect CPU caches to still have useful caches after STW?

I don't know, that's yet another thing to measure. I wouldn't be surprised though. STW at GC start doesn't do much (no scanning), and all of the other threads are idle, so those CPUs may just be idle.

@thanm thanm added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
Development

No branches or pull requests

3 participants