Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: optimize bitset tests #31904

Closed
rillig opened this issue May 8, 2019 · 7 comments
Closed

cmd/compile: optimize bitset tests #31904

rillig opened this issue May 8, 2019 · 7 comments
Labels
FrozenDueToAge help wanted NeedsFix The path to resolution is known, but the work has not been done. Performance
Milestone

Comments

@rillig
Copy link
Contributor

rillig commented May 8, 2019

What version of Go are you using (go version)?

go version go1.12.1 windows/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

set GOARCH=amd64
set GOBIN=
set GOCACHE=C:\Users\rillig\AppData\Local\go-build
set GOEXE=.exe
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOOS=windows
set GOPATH=C:\Users\rillig\go
set GOPROXY=
set GORACE=
set GOROOT=C:\Program Files\Go
set GOTMPDIR=
set GOTOOLDIR=C:\Program Files\Go\pkg\tool\windows_amd64
set GCCGO=gccgo
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set GOMOD=
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0 -fdebug-prefix-map=C:\Users\rillig\Program Files\cygwin\tmp\go-build152819318=/tmp/go-build -gno-record-gcc-switches

What did you do?

https://play.golang.org/p/G1EhaxsYsAV

What did you expect to see?

When a bitset is queried for a single bit, it doesn't matter whether set & q is compared to 0 or to the expected bit. Comparing to 0 saves an instruction on x86_64 and doesn't modify a register.

Therefore I expected the generated code to use TESTQ $0x8, AX instead of the combined ANDQ and CMPQ.

I didn't measure how often this pattern appears in the wild, but when I saw the generated code I immediately wondered why it uses two instructions when the same effect can be achieved with a single instruction and even fewer side effects.

What did you see instead?

  bitset.go:13          MOVQ os.Args+8(SB), AX
  bitset.go:14          NOPL
  bitset.go:14          XORPS X0, X0
  bitset.go:14          MOVUPS X0, 0x40(SP)
  bitset.go:14          LEAQ runtime.types+65664(SB), CX
  bitset.go:14          MOVQ CX, 0x40(SP)
  bitset.go:9           ANDQ $0x8, AX              <-- seems inefficient
  bitset.go:9           CMPQ $0x8, AX              <-- seems inefficient
  bitset.go:9           SETE AL
  bitset.go:14          MOVZX AL, AX
@randall77
Copy link
Contributor

Indeed. We currently handle x & 8 == 0 but not x & 8 == 8.

@gopherbot
Copy link

Change https://golang.org/cl/175957 mentions this issue: cmd/compile: optimize bitset tests

@tmthrgd
Copy link
Contributor

tmthrgd commented May 8, 2019

This optimisation is only valid for x & y == y if y has exactly one bit set (i.e. is a power of two).

See this example: https://play.golang.org/p/fAqS02z3Fgm

@cuonglm
Copy link
Member

cuonglm commented May 8, 2019

@tmthrgd it's right, finding a way to fix it.

@randall77 is there anyway to check a *ssa.Value is power of two? I see there's a function isPowerOfTwo, but it gets an int64 instead.

@randall77
Copy link
Contributor

isPowerOfTwo is probably the right one to use.
You're going to need to make the rules require that one argument is a constant.

It would be nice if you could do this optimization in the generic rules, something like:

(Eq64 (And64 x (Const64 [y])) (Const64 [y])) && isPowerOfTwo(y) -> (Ne64 (And64 x (Const64 [y])) (Const64 [0]))

(And 32/16/8 versions, Ne -> Eq also)
That way all the architectures get the optimization.

@cuonglm
Copy link
Member

cuonglm commented May 8, 2019

@randall77 interesting, with these rules added in generic.rules:

// Optimize bitsets
(Eq(8|16|32|64) (And(8|16|32|64) <t> x (Const(8|16|32|64) <t> [y])) (Const(8|16|32|64) <t> [y])) && isPowerOfTwo(y)
  -> (Neq(8|16|32|64) (And(8|16|32|64) <t> x (Const(8|16|32|64) <t> [y])) (Const(8|16|32|64) <t> [0]))
(Neq(8|16|32|64) (And(8|16|32|64) <t> x (Const(8|16|32|64) <t> [y])) (Const(8|16|32|64) <t> [y])) && isPowerOfTwo(y)
  -> (Eq(8|16|32|64) (And(8|16|32|64) <t> x (Const(8|16|32|64) <t> [y])) (Const(8|16|32|64) <t> [0]))

For return set&8 == 8, the assembly code generate using BTL instead of TESTQ:

"".AllBitsSet STEXT nosplit size=15 args=0x18 locals=0x0
	0x0000 00000 (main.go:8)	TEXT	"".AllBitsSet(SB), NOSPLIT|ABIInternal, $0-24
	0x0000 00000 (main.go:8)	FUNCDATA	$0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (main.go:8)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (main.go:8)	FUNCDATA	$2, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (main.go:12)	PCDATA	$0, $0
	0x0000 00000 (main.go:12)	PCDATA	$1, $0
	0x0000 00000 (main.go:12)	MOVQ	"".set+8(SP), AX
	0x0005 00005 (main.go:12)	BTL	$3, AX
	0x0009 00009 (main.go:12)	SETCS	"".~r2+24(SP)
	0x000e 00014 (main.go:12)	RET

I benched with no speed improvement compare to ANDQ + COMPQ.

@randall77
Copy link
Contributor

Yes, it should use BTL. It's smaller, but probably not any faster than TESTQ.
It might be hard to find a benchmark which cares. I'm happy for this to go in as long as the benchmarks don't get worse.

@andybons andybons added the NeedsFix The path to resolution is known, but the work has not been done. label May 8, 2019
tomocy pushed a commit to tomocy/go that referenced this issue Sep 1, 2019
The assembly output for x & c == c, where c is power of 2:

	MOVQ	"".set+8(SP), AX
	ANDQ	$8, AX
	CMPQ	AX, $8
	SETEQ	"".~r2+24(SP)

With optimization using bitset:

	MOVQ	"".set+8(SP), AX
	BTL	$3, AX
	SETCS	"".~r2+24(SP)

output less than 1 instruction.

However, there is no speed improvement:

name         old time/op  new time/op  delta
AllBitSet-8  0.35ns ± 0%  0.35ns ± 0%   ~     (all equal)

Fixes golang#31904

Change-Id: I5dca4e410bf45716ed2145e3473979ec997e35d4
Reviewed-on: https://go-review.googlesource.com/c/go/+/175957
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
t4n6a1ka pushed a commit to t4n6a1ka/go that referenced this issue Sep 5, 2019
The assembly output for x & c == c, where c is power of 2:

	MOVQ	"".set+8(SP), AX
	ANDQ	$8, AX
	CMPQ	AX, $8
	SETEQ	"".~r2+24(SP)

With optimization using bitset:

	MOVQ	"".set+8(SP), AX
	BTL	$3, AX
	SETCS	"".~r2+24(SP)

output less than 1 instruction.

However, there is no speed improvement:

name         old time/op  new time/op  delta
AllBitSet-8  0.35ns ± 0%  0.35ns ± 0%   ~     (all equal)

Fixes golang#31904

Change-Id: I5dca4e410bf45716ed2145e3473979ec997e35d4
Reviewed-on: https://go-review.googlesource.com/c/go/+/175957
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
@golang golang locked and limited conversation to collaborators Aug 26, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge help wanted NeedsFix The path to resolution is known, but the work has not been done. Performance
Projects
None yet
Development

No branches or pull requests

6 participants