Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: optimize CAS loops #29716

Open
CAFxX opened this issue Jan 13, 2019 · 1 comment
Open

cmd/compile: optimize CAS loops #29716

CAFxX opened this issue Jan 13, 2019 · 1 comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@CAFxX
Copy link
Contributor

CAFxX commented Jan 13, 2019

Note: I'm not 100% sure the transformation I'm proposing is correct, or profitable. If it's not feel free to close.

Consider the typical Go CAS loop:

    for {
        oldA := atomic.LoadUint64(&A)
        newA := f(oldA)
        if atomic.CompareAndSwapUint64(&A, oldA, newA) {
            return
        }
    }

On x64, this gets compiled to: (https://godbolt.org/z/7905Kp)

main_pc29:
        movq    "".A(SB), AX
		// (compute newA, store it in CX...) 
        leaq    "".A(SB), DX
        lock
        cmpxchgq        CX, (DX)
        seteq   AL
        testb   AL, AL
        jeq     main_pc29
        // (return...)

There are 3 things that jump out:

  • cmpxchgq, if A != oldA, already reloads the updated value of A in AX, so the first instruction of the loop (reloading A) can be skipped from the second iteration onwards
  • the seteq and testb after cmqxchgq are not required (they, and the jeq can be replaced with a jnz main_pc29)
  • the address of A is reloaded at every iteration: it should be hoisted outside of the loop (note that in the link above f() is marked noinline, so it makes sense to reload the address of A once f returns; but even if you remove the noinline the address is still reloaded even though the register is not reused for other purposes) (cmd/compile: loads/constants not lifted out of loop #15808?)

Rewritten, the loop could become:

        movq    "".A(SB), AX   // atomic.LoadUint64(&A)
        leaq    "".A(SB), DX   // DX = &A
main_pc29:
		// (compute newA, store it in CX...) 
        lock
        cmpxchgq        CX, (DX)
        jnz     main_pc29
        // (return...)

Since modifying atomic.CompareAndSwap* to return the reloaded value instead of a bool is likely not an option, it would be great if the compiler learned to do this kind of transformation (when safe).

@randall77
Copy link
Contributor

the seteq and testb after cmqxchgq are not required (they, and the jeq can be replaced with a jnz main_pc29)

This would be nice to fix. There's even a TODO in the code:

		// TODO: have these return flags instead of bool.  The current system generates:
		//    CMPXCHGQ ...
		//    SETEQ AX
		//    CMPB  AX, $0
		//    JNE ...
		// instead of just
		//    CMPXCHGQ ...
		//    JEQ ...
		// but we can't do that because memory-using ops can't generate flags yet
		// (flagalloc wants to move flag-generating instructions around).

The restriction mentioned may still be a problem. Memory-using ops can now generate flags, but I think it would still be a problem for memory-generating ops.

cmpxchgq, if A != oldA, already reloads the updated value of A in AX, so the first instruction of the loop (reloading A) can be skipped from the second iteration onwards

the address of A is reloaded at every iteration: it should be hoisted outside of the loop (note that in the link above f() is marked noinline, so it makes sense to reload the address of A once f returns; but even if you remove the noinline the address is still reloaded even though the register is not reused for other purposes) (#15808?)

I'm not particularly worried about either of these. The loop should only execute once in typical situations (and if it doesn't, you have bigger problems).

@julieqiu julieqiu added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label May 28, 2019
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 13, 2022
@seankhliao seankhliao added this to the Unplanned milestone Aug 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
Status: Triage Backlog
Development

No branches or pull requests

6 participants