Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: getMCache causes allocation fast path regression #42305

Closed
mknyszek opened this issue Oct 30, 2020 · 11 comments
Closed

runtime: getMCache causes allocation fast path regression #42305

mknyszek opened this issue Oct 30, 2020 · 11 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. Performance release-blocker
Milestone

Comments

@mknyszek
Copy link
Contributor

bent benchmarks revealed a regression in some community microbenchmarks on 10/26. Bisecting, I discovered that c02134a is to blame.

That commit is very simple, so my best guess is it's the fact that we're looking up the M and the P a second time. Also, the compiler doesn't seem to inline getMCache which is unfortunate and could be part of it. Not sure why, though. Maybe mallogc is too big?

Anyway, we should fix this, probably by manually inlining it back into mallocgc and leaving a comment pointing to getMCache? There are a couple other circumstances where we do this extra dereference where it would be nice to not do so, so maybe getMCache should take a p pointer that might be nil? That means potentially doing a second lookup since it's ambiguous whether we don't have a P or just haven't looked it up yet, but even if we do we're only doing it when bootstrapping which is probably fine.

@prattmic WDYT?

@mknyszek mknyszek added Performance NeedsFix The path to resolution is known, but the work has not been done. release-blocker labels Oct 30, 2020
@mknyszek mknyszek added this to the Go1.16 milestone Oct 30, 2020
@mknyszek mknyszek self-assigned this Oct 30, 2020
@mknyszek
Copy link
Contributor Author

OK I think we've pretty much confirmed it's the additional function call, not reloading the P.

@mknyszek
Copy link
Contributor Author

What's preventing getMCache from inlining seems to be the throw inside. If that gets treated like a regular call, then it's still not quite enough. But a throw is like a panic, but strictly rarer because a throw can't be recovered from. Perhaps it should be cheaper?

CC @dr2chase who knows all about mid-stack inlining and inlining costs.

@josharian
Copy link
Contributor

Aside: Fixing #24686 might let us get rid of mcache entirely. I never managed to get it implemented, but I don't know the runtime as well as you. :)

@Helflym
Copy link
Contributor

Helflym commented Nov 2, 2020

There are some failures in last builds for AIX which seem to related to this issue:
https://build.golang.org/log/fa90b60048ac7f99bc84523d236948a1f090b4c3
https://build.golang.org/log/a5515473b2094744ca0ff0d19c330fa66ca6184a

I've searched really quickly through the buildfarm dashboard but it didn't find any other appearance.

@mknyszek
Copy link
Contributor Author

mknyszek commented Nov 2, 2020

There are some failures in last builds for AIX which seem to related to this issue:
https://build.golang.org/log/fa90b60048ac7f99bc84523d236948a1f090b4c3
https://build.golang.org/log/a5515473b2094744ca0ff0d19c330fa66ca6184a

I've searched really quickly through the buildfarm dashboard but it didn't find any other appearance.

Huh. That is a problem (write barrier without a P...?) but separate from this one. Just the fact that allocSpan now actually cares that we have a P.

@mknyszek
Copy link
Contributor Author

mknyszek commented Nov 2, 2020

@Helflym Could you file a new issue for that and assign it to me?

@Helflym
Copy link
Contributor

Helflym commented Nov 2, 2020

@Helflym Could you file a new issue for that and assign it to me?

Done with #42339. But it can't assign it to you. I don't have the permissions I guess

@mknyszek
Copy link
Contributor Author

mknyszek commented Nov 2, 2020

So there are a number of ways to fix this regression. First is to make throw have cost 1, so getMCache gets inlined. Second is to manually inline getMCache and leave a comment. Third is to reorganize getMCache to put the throw in some //go:noinline function and hope that the call cost (57) is low enough that it still gets inlined.

For (1) and (3), we should add getMCache to the list of functions we want to make sure get inlined. I'm leaning toward (2) as the safest solution for the 1.16 release, though (3) is fine also (albeit a bit ugly).

@prattmic @dr2chase WDYT?

@mknyszek
Copy link
Contributor Author

mknyszek commented Nov 2, 2020

A fourth alternative is to change getMCache to return nil if it can't get one instead of throwing. Then it would be up to the caller to throw if necessary. This would be useful to handle #42339 also since apparently allocSpan needs to be able to run without a P. It requires updating each callsite, but because it also helps with another issue, this seems like a good idea to me.

@mknyszek
Copy link
Contributor Author

mknyszek commented Nov 2, 2020

I think I'm gonna go with 4.

(Credit to @prattmic for coming up with option 4 last week.)

@gopherbot
Copy link

Change https://golang.org/cl/267157 mentions this issue: runtime: make getMCache inlineable

@golang golang locked and limited conversation to collaborators Nov 2, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. Performance release-blocker
Projects
None yet
Development

No branches or pull requests

4 participants