Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: optimize bits.OnesCount(x) == 1 #38547

Open
josharian opened this issue Apr 20, 2020 · 7 comments
Open

cmd/compile: optimize bits.OnesCount(x) == 1 #38547

josharian opened this issue Apr 20, 2020 · 7 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@josharian
Copy link
Contributor

josharian commented Apr 20, 2020

It'd be nice for the compiler to be able to optimize bits.OnesCount(x) == 1 to x&(x-1) != 0 && x != 0.

This is a bit tricky, because by the time we get to the rewrite rules, there's already a branch to check cpu feature availability.

If we figure this out, there might be other math/bits function result comparisons we want to optimize.

Implementation ideas welcome.

@josharian josharian added this to the Unplanned milestone Apr 20, 2020
@gopherbot
Copy link

Change https://golang.org/cl/229127 mentions this issue: cmd/compile: use cheaper implementation of oneBit

@bradfitz
Copy link
Contributor

I guess you don't like doing it earlier in the compiler before it reaches SSA? Probably cost outweighs the benefits, and clutters earlier phases?

gopherbot pushed a commit that referenced this issue Apr 21, 2020
Updates #38547

file    before    after     Δ       %       
compile 19678112  19669808  -8304   -0.042% 
total   113143160 113134856 -8304   -0.007% 

Change-Id: I5f8afe17401dbdb7c7b3d66d95fe40821c499a92
Reviewed-on: https://go-review.googlesource.com/c/go/+/229127
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@josharian
Copy link
Contributor Author

cost outweighs the benefits, and clutters earlier phases?

Exactly.

@gopherbot
Copy link

Change https://golang.org/cl/229197 mentions this issue: cmd/compile: use cheaper implementation of oneBit

gopherbot pushed a commit that referenced this issue Apr 21, 2020
This is the second attempt. The first attempt was CL 229127,
which got rolled back by CL 229177, because it caused
an infinite loop during compilation on some platforms.
I didn't notice that the trybots hadn't completed when I submitted; mea culpa.

The bug was that we were checking x&(x-1)==0, which is also true of 0,
which does not have exactly one bit set.
This caused an infinite rewrite rule loop.

Updates #38547

file    before    after     Δ       %
compile 19678112  19669808  -8304   -0.042%
total   113143160 113134856 -8304   -0.007%

Change-Id: I417a4f806e1ba61277e31bab2e57dd3f1ac7e835
Reviewed-on: https://go-review.googlesource.com/c/go/+/229197
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Munday <mike.munday@ibm.com>
@andybons andybons added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Apr 21, 2020
@quasilyte
Copy link
Contributor

quasilyte commented Nov 30, 2021

As a side, but related note, at least with GOAMD64=v2 and above there is no CPU flag check, so we end up with this code instead:

XORL	CX, CX
POPCNTQ	AX, CX
CMPQ	CX, $1
SETEQ	AL

No call to math/bits.OnesCount(SB) is emited.
For the completeness, the proposed replacement pattern x&(x-1) != 0 && x != 0 is compiled as:

LEAQ	-1(AX), CX
TESTQ	CX, AX
JEQ	17
TESTQ	AX, AX
SETNE	CL
JMP	19
XORL	CX, CX
MOVL	CX, AX

Tested on Go tip (well, not the latest one, but close enough: fad67f8).

@josharian
Copy link
Contributor Author

That replacement compiled code should have no jumps in it. We must be missing a rewrite rule or three.

@quasilyte
Copy link
Contributor

OK, sorry for bumping this up again, but I was staring at the originally proposed replacement and it doesn't look 100% correct.
Or maybe I miscalculated something.

Perhaps it should look like this?

- ((x&(x-1)) != 0) && x != 0
+ ((x&(x-1)) == 0) && x != 0

With that in mind, here is a comparison with different compilers output.
Clang (https://godbolt.org/z/o9PzM49TG):

        lea     eax, [rdi - 1]
        test    edi, eax
        sete    cl
        test    edi, edi
        setne   al
        and     al, cl
        ret

GCC:

        lea     eax, [rdi-1]
        test    eax, edi
        sete    al
        test    edi, edi
        setne   dl
        and     eax, edx
        ret

Go (https://godbolt.org/z/nfKjKe93a):

        LEAQ    -1(AX), CX
        TESTQ   CX, AX
        JEQ     f_pc17
        TESTQ   AX, AX
        SETNE   CL
        JMP     f_pc19
f_pc17:
        XORL    CX, CX
f_pc19:
        MOVL    CX, AX
        RET

It looks like Go should be able to remove some branches here too.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

5 participants