New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: optimize repeated getg() calls #29671
Comments
I originally made the implementation of
And in that situation, you can't CSE them. I think it would be safe to allow CSEing of Do you have a benchmark that demonstrates any slowdown? |
Isn't the runtime the only place where you can call edit: ah, maybe you were referring to
Not really, and I am having a hard time imagining how to come up with one. My main motivator for filing this issue was to counteract the text size/icache pollution increase due to inlining of those fast paths. |
Indeed, it looks like that is the case at the moment. |
The new internal ABI could potentially make it possible to reserve a G register on AMD64. Then getg generates no instruction, like on non-x86 architectures. |
With the new ABI we now have a reserved G register (R14). I think we can close this. Feel free to reopen if there is more to address. |
While playing around with inlining the fast paths of
runtime.lock/unlock
I noticed that repeatedgetg()
calls don't get optimized away. This is especially visible with midstack inlining, because it is more likely that a function callinggetg()
is inlined in another function callinggetg()
. This results in something like:Notice the three loads from
FS:-8
, two of which are part of the repeated sequenceIIUC it's true that the value of
<R1>
could change if the G is migrated to a different thread, but the address of the G itself (<R2>
) should not change even if this happens.The text was updated successfully, but these errors were encountered: