cmd/compile: optimize CAS loops #29716
Labels
compiler/runtime
Issues related to the Go compiler and/or runtime.
NeedsInvestigation
Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Performance
Milestone
Note: I'm not 100% sure the transformation I'm proposing is correct, or profitable. If it's not feel free to close.
Consider the typical Go CAS loop:
On x64, this gets compiled to: (https://godbolt.org/z/7905Kp)
There are 3 things that jump out:
cmpxchgq
, ifA != oldA
, already reloads the updated value ofA
inAX
, so the first instruction of the loop (reloadingA
) can be skipped from the second iteration onwardsseteq
andtestb
aftercmqxchgq
are not required (they, and thejeq
can be replaced with ajnz main_pc29
)A
is reloaded at every iteration: it should be hoisted outside of the loop (note that in the link abovef()
is marked noinline, so it makes sense to reload the address ofA
oncef
returns; but even if you remove the noinline the address is still reloaded even though the register is not reused for other purposes) (cmd/compile: loads/constants not lifted out of loop #15808?)Rewritten, the loop could become:
Since modifying
atomic.CompareAndSwap*
to return the reloaded value instead of abool
is likely not an option, it would be great if the compiler learned to do this kind of transformation (when safe).The text was updated successfully, but these errors were encountered: