cmd/compile: compute booleans without jumps #5729

gopherbot · 2013-06-18T11:52:15Z

What steps will reproduce the problem?

take:
http://play.golang.org/p/y2nj-Icl-j

bench and notice the difference between an int that gets xor with 1 and a bool that gets
flipped

go test bench=".*"
BenchmarkBoolFlip   2000000000           1.93 ns/op
BenchmarkBoolFlipManual 2000000000           1.92 ns/op
BenchmarkBoolXor    2000000000           0.85 ns/op

This might be only a very minor performance improvement as such but it also has the side
effect to not require and use resources of the branch predictor.

What is the expected output?

would think flag=!flag gets compiled to (gcc for C does this too):
0013 (boolxor.go:8) XORQ    $1,AX

What do you see instead?

0013 (boolxor.go:8) JMP     ,16
0014 (boolxor.go:8) MOVQ    $1,AX
0015 (boolxor.go:8) JMP     ,9
0016 (boolxor.go:8) CMPB    AX,$0
0017 (boolxor.go:8) JEQ     ,14
0018 (boolxor.go:8) MOVQ    $0,AX

if jumps need to be done: reordering and eliminating the first jump just makes a nearly
unnoticeable improvement of 1ns in the bench on average (see FlipManual). 
0034 (boolxor.go:15) CMPB    AX,$0
0035 (boolxor.go:15) JEQ     ,38
0036 (boolxor.go:16) MOVQ    $0,AX
0037 (boolxor.go:15) JMP     ,30
0038 (boolxor.go:18) MOVQ    $1,AX


Which compiler are you using (5g, 6g, 8g, gccgo)?
6g

Which operating system are you using?
Mac OS X 10.8.4
Darwin 12.4.0 Darwin Kernel Version 12.4.0: Wed May  1 17:57:12 PDT 2013;
root:xnu-2050.24.15~1/RELEASE_X86_64 x86_64

Which version are you using?  (run 'go version')
go version devel +27baf5978be6 Tue Jun 18 15:26:15 2013 +1000 darwin/amd64

Notes:
If you think such minor code generation issues (few jumps to much here and there,
boolean expression simplifications) are better left to be done faster by gccgo and not
be special cased by normal go compiler please let me know.

gopherbot · 2013-06-18T12:29:35Z

Comment 1 by martisch@uos.de:

btw: i noticed this while programming a left leaning red black tree implementation
similar to "http://code.google.com/p/biogo/source/browse/llrb.go?repo=llrb#103" that
needs to flip link colors in a few places. for benchmarks of 1000000 inserts it seems to
make a difference between 1510-1500 ns/op (boolean "!") and 1460-1450 ns/op (making the
boolean an int and xor it) for an insert.
node.linkcolor = !node.linkcolor
vs
node.linkcolor = 1 ^ node.linkcolor

rsc · 2013-06-18T12:31:40Z

Comment 2:

I don't think it is worth recognizing
if flag {
   flag = false
} else {
   flag = true
}
I think it would be fine to recognize flag = !flag.
Russ

gopherbot · 2013-06-18T12:38:32Z

Comment 3 by martisch@uos.de:

i didnt mean for FlipManual to be recognized for optimization either.
i just included BenchmarkBoolFlipManual to show the relative difference of performance
when reordering branches because it generates a better jump sequence than flag = !flag
currently.

ianlancetaylor · 2013-06-20T16:46:52Z

Comment 4:

Labels changed: added priority-later, performance, removed priority-triage.

robpike · 2013-06-21T15:32:40Z

Comment 5:

Status changed to Accepted.

rsc · 2013-06-21T17:01:19Z

Comment 6:

The issue is that cgen of a bool value uses bgen (bool jump generator) to compute the
value. For example in 6g the offending code is the call to bgen here:
    // these call bgen to get a bool value
    case OOROR:
    case OANDAND:
    case OEQ:
    case ONE:
    case OLT:
    case OLE:
    case OGE:
    case OGT:
    case ONOT:
        p1 = gbranch(AJMP, T, 0);
        p2 = pc;
        gmove(nodbool(1), res);
        p3 = gbranch(AJMP, T, 0);
        patch(p1, pc);
        bgen(n, 1, 0, p2);
        gmove(nodbool(0), res);
        patch(p3, pc);
        goto ret;
The fix would be to introduce a copy of bgen called bvgen that computes a boolean
instead of generating a jump, and then call bvgen here. 
The issue exists in the other compilers too.

Labels changed: added go1.2maybe.

rsc · 2013-07-30T22:21:19Z

Comment 7:

See also issue #4397.

rsc · 2013-07-30T22:38:44Z

Comment 8:

Labels changed: added feature.

robpike · 2013-08-29T03:10:00Z

Comment 9:

Not for 1.2.

Labels changed: removed go1.2maybe.

rsc · 2013-11-27T18:45:42Z

Comment 10:

Labels changed: added go1.3maybe.

rsc · 2013-11-27T20:29:16Z

Comment 11:

Labels changed: removed feature.

rsc · 2013-12-04T01:29:04Z

Comment 12:

Labels changed: added release-none, removed go1.3maybe.

rsc · 2013-12-04T01:43:48Z

Comment 13:

Labels changed: added repo-main.

josharian · 2014-08-18T17:21:31Z

Comment 14:

Labels changed: removed priority-later.

Owner changed to @josharian.

Status changed to Started.

josharian · 2014-08-20T18:14:28Z

Comment 15:

WIP CL for 6g: https://golang.org/cl/129400044/. Kibitzing welcomed.

Use SETcc instructions instead of Jcc to generate boolean values. This generates shorter, jump-free code, which may also enable other peephole optimizations. For example, given func f(i, j int) bool { return i == j } Before "".f t=1 size=32 value=0 args=0x18 locals=0x0 0x0000 00000 (p.go:3) TEXT "".f+0(SB),4,$0-24 0x0000 00000 (p.go:3) FUNCDATA $0,gclocals·f90cfd099b5ec2b453c391fece9d42bb+0(SB) 0x0000 00000 (p.go:3) FUNCDATA $1,gclocals·3280bececceccd33cb74587feedb1f9f+0(SB) 0x0000 00000 (p.go:4) MOVQ "".i+8(FP),BX 0x0005 00005 (p.go:4) MOVQ "".j+16(FP),BP 0x000a 00010 (p.go:4) CMPQ BX,BP 0x000d 00013 (p.go:4) JEQ ,21 0x000f 00015 (p.go:4) MOVB $0,"".~r2+24(FP) 0x0014 00020 (p.go:4) RET , 0x0015 00021 (p.go:4) MOVB $1,"".~r2+24(FP) 0x001a 00026 (p.go:4) JMP ,20 After "".f t=1 size=32 value=0 args=0x18 locals=0x0 0x0000 00000 (p.go:3) TEXT "".f+0(SB),4,$0-24 0x0000 00000 (p.go:3) FUNCDATA $0,gclocals·f90cfd099b5ec2b453c391fece9d42bb+0(SB) 0x0000 00000 (p.go:3) FUNCDATA $1,gclocals·3280bececceccd33cb74587feedb1f9f+0(SB) 0x0000 00000 (p.go:4) MOVQ "".i+8(FP),BX 0x0005 00005 (p.go:4) MOVQ "".j+16(FP),BP 0x000a 00010 (p.go:4) CMPQ BX,BP 0x000d 00013 (p.go:4) SETEQ ,"".~r2+24(FP) 0x0012 00018 (p.go:4) RET , Top movers in stdlib benchmarks: benchmark old ns/op new ns/op delta BenchmarkNextafter64 7.72 5.86 -24.09% BenchmarkSignbit 2.13 2.78 +30.52% BenchmarkNextafter32 7.98 6.17 -22.68% BenchmarkContendedSemaphore 72.0 86.1 +19.58% BenchmarkIndexByte32 6.94 8.28 +19.31% BenchmarkIndexRuneFastPath 27.3 22.9 -16.12% BenchmarkUnquoteEasy 190 217 +14.21% BenchmarkCompareBytesToNil 6.92 6.13 -11.42% BenchmarkComplex128DivNisNaN 11.4 10.1 -11.40% BenchmarkEqual16 7.18 6.38 -11.14% BenchmarkEqual9 7.18 6.38 -11.14% BenchmarkEqual15 7.18 6.41 -10.72% BenchmarkSearchWrappers 162 145 -10.49% BenchmarkEqual20 8.01 7.18 -10.36% BenchmarkCompareBytesBigIdentical 5.32 4.79 -9.96% BenchmarkInterfaceSmall 15.2 13.8 -9.21% BenchmarkEqual32 8.81 8.00 -9.19% BenchmarkTrimSpace 52.7 58.0 +10.06% BenchmarkCompareBytesEmpty 5.87 5.34 -9.03% BenchmarkMapStringKeysEight_64 21.8 23.9 +9.63% BenchmarkMapStringKeysEight_16 22.8 20.8 -8.77% BenchmarkMapStringKeysEight_1M 21.9 24.0 +9.59% BenchmarkMapStringKeysEight_32 21.9 24.0 +9.59% BenchmarkCompareBytesIdentical 5.85 5.34 -8.72% BenchmarkAcosh 32.2 35.2 +9.32% BenchmarkMin 3.20 2.93 -8.44% BenchmarkIPString 2120 1954 -7.83% BenchmarkChanSem 61.9 57.3 -7.43% BenchmarkCompareBytesDifferentLength 6.91 6.40 -7.38% BenchmarkCompareBytesEqual 6.91 6.40 -7.38% BenchmarkCompareBytesSameLength 6.92 6.41 -7.37% All regressions I've investigated are due to incidental code movement. The godoc binary is ~0.2% smaller after this CL. Updates golang#5729. Other architectures will be done in subsequent CLs. Change-Id: I0e167e259274b722958567fc0af83a17ca002da7

Use SETcc instructions instead of Jcc to generate boolean values. This generates shorter, jump-free code, which may in turn enable other peephole optimizations. For example, given func f(i, j int) bool { return i == j } Before "".f t=1 size=32 value=0 args=0x18 locals=0x0 0x0000 00000 (x.go:3) TEXT "".f(SB), $0-24 0x0000 00000 (x.go:3) FUNCDATA $0, gclocals·b4c25e9b09fd0cf9bb429dcefe91c353(SB) 0x0000 00000 (x.go:3) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:4) MOVQ "".i+8(FP), BX 0x0005 00005 (x.go:4) MOVQ "".j+16(FP), BP 0x000a 00010 (x.go:4) CMPQ BX, BP 0x000d 00013 (x.go:4) JEQ 21 0x000f 00015 (x.go:4) MOVB $0, "".~r2+24(FP) 0x0014 00020 (x.go:4) RET 0x0015 00021 (x.go:4) MOVB $1, "".~r2+24(FP) 0x001a 00026 (x.go:4) JMP 20 After "".f t=1 size=32 value=0 args=0x18 locals=0x0 0x0000 00000 (x.go:3) TEXT "".f(SB), $0-24 0x0000 00000 (x.go:3) FUNCDATA $0, gclocals·b4c25e9b09fd0cf9bb429dcefe91c353(SB) 0x0000 00000 (x.go:3) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:4) MOVQ "".i+8(FP), BX 0x0005 00005 (x.go:4) MOVQ "".j+16(FP), BP 0x000a 00010 (x.go:4) CMPQ BX, BP 0x000d 00013 (x.go:4) SETEQ "".~r2+24(FP) 0x0012 00018 (x.go:4) RET regexp benchmarks, best of 12 runs: benchmark old ns/op new ns/op delta BenchmarkNotOnePassShortB 782 733 -6.27% BenchmarkLiteral 180 171 -5.00% BenchmarkNotLiteral 2855 2721 -4.69% BenchmarkMatchHard_32 2672 2557 -4.30% BenchmarkMatchHard_1K 80182 76732 -4.30% BenchmarkMatchEasy1_32M 76440180 73304748 -4.10% BenchmarkMatchEasy1_32K 68798 66350 -3.56% BenchmarkAnchoredLongMatch 482 465 -3.53% BenchmarkMatchEasy1_1M 2373042 2292692 -3.39% BenchmarkReplaceAll 2776 2690 -3.10% BenchmarkNotOnePassShortA 1397 1360 -2.65% BenchmarkMatchClass_InRange 3842 3742 -2.60% BenchmarkMatchEasy0_32 125 122 -2.40% BenchmarkMatchEasy0_32K 11414 11164 -2.19% BenchmarkMatchEasy0_1K 668 654 -2.10% BenchmarkAnchoredShortMatch 260 255 -1.92% BenchmarkAnchoredLiteralShortNonMatch 164 161 -1.83% BenchmarkOnePassShortB 623 612 -1.77% BenchmarkOnePassShortA 801 788 -1.62% BenchmarkMatchClass 4094 4033 -1.49% BenchmarkMatchEasy0_32M 14078800 13890704 -1.34% BenchmarkMatchHard_32K 4095844 4045820 -1.22% BenchmarkMatchEasy1_1K 1663 1643 -1.20% BenchmarkMatchHard_1M 131261708 129708215 -1.18% BenchmarkMatchHard_32M 4210112412 4169292003 -0.97% BenchmarkMatchMedium_32K 2460752 2438611 -0.90% BenchmarkMatchEasy0_1M 422914 419672 -0.77% BenchmarkMatchMedium_1M 78581121 78040160 -0.69% BenchmarkMatchMedium_32M 2515287278 2498464906 -0.67% BenchmarkMatchMedium_32 1754 1746 -0.46% BenchmarkMatchMedium_1K 52105 52106 +0.00% BenchmarkAnchoredLiteralLongNonMatch 185 185 +0.00% BenchmarkMatchEasy1_32 107 107 +0.00% BenchmarkOnePassLongNotPrefix 505 505 +0.00% BenchmarkOnePassLongPrefix 147 147 +0.00% The godoc binary is ~0.12% smaller after this CL. Updates golang#5729. toolstash -cmp passes for all architectures other than amd64 and amd64p32. Other architectures can be done in follow-up CLs. Change-Id: I0e167e259274b722958567fc0af83a17ca002da7

Use SETcc instructions instead of Jcc to generate boolean values. This generates shorter, jump-free code, which may in turn enable other peephole optimizations. For example, given func f(i, j int) bool { return i == j } Before "".f t=1 size=32 value=0 args=0x18 locals=0x0 0x0000 00000 (x.go:3) TEXT "".f(SB), $0-24 0x0000 00000 (x.go:3) FUNCDATA $0, gclocals·b4c25e9b09fd0cf9bb429dcefe91c353(SB) 0x0000 00000 (x.go:3) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:4) MOVQ "".i+8(FP), BX 0x0005 00005 (x.go:4) MOVQ "".j+16(FP), BP 0x000a 00010 (x.go:4) CMPQ BX, BP 0x000d 00013 (x.go:4) JEQ 21 0x000f 00015 (x.go:4) MOVB $0, "".~r2+24(FP) 0x0014 00020 (x.go:4) RET 0x0015 00021 (x.go:4) MOVB $1, "".~r2+24(FP) 0x001a 00026 (x.go:4) JMP 20 After "".f t=1 size=32 value=0 args=0x18 locals=0x0 0x0000 00000 (x.go:3) TEXT "".f(SB), $0-24 0x0000 00000 (x.go:3) FUNCDATA $0, gclocals·b4c25e9b09fd0cf9bb429dcefe91c353(SB) 0x0000 00000 (x.go:3) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:4) MOVQ "".i+8(FP), BX 0x0005 00005 (x.go:4) MOVQ "".j+16(FP), BP 0x000a 00010 (x.go:4) CMPQ BX, BP 0x000d 00013 (x.go:4) SETEQ "".~r2+24(FP) 0x0012 00018 (x.go:4) RET regexp benchmarks, best of 12 runs: benchmark old ns/op new ns/op delta BenchmarkNotOnePassShortB 782 733 -6.27% BenchmarkLiteral 180 171 -5.00% BenchmarkNotLiteral 2855 2721 -4.69% BenchmarkMatchHard_32 2672 2557 -4.30% BenchmarkMatchHard_1K 80182 76732 -4.30% BenchmarkMatchEasy1_32M 76440180 73304748 -4.10% BenchmarkMatchEasy1_32K 68798 66350 -3.56% BenchmarkAnchoredLongMatch 482 465 -3.53% BenchmarkMatchEasy1_1M 2373042 2292692 -3.39% BenchmarkReplaceAll 2776 2690 -3.10% BenchmarkNotOnePassShortA 1397 1360 -2.65% BenchmarkMatchClass_InRange 3842 3742 -2.60% BenchmarkMatchEasy0_32 125 122 -2.40% BenchmarkMatchEasy0_32K 11414 11164 -2.19% BenchmarkMatchEasy0_1K 668 654 -2.10% BenchmarkAnchoredShortMatch 260 255 -1.92% BenchmarkAnchoredLiteralShortNonMatch 164 161 -1.83% BenchmarkOnePassShortB 623 612 -1.77% BenchmarkOnePassShortA 801 788 -1.62% BenchmarkMatchClass 4094 4033 -1.49% BenchmarkMatchEasy0_32M 14078800 13890704 -1.34% BenchmarkMatchHard_32K 4095844 4045820 -1.22% BenchmarkMatchEasy1_1K 1663 1643 -1.20% BenchmarkMatchHard_1M 131261708 129708215 -1.18% BenchmarkMatchHard_32M 4210112412 4169292003 -0.97% BenchmarkMatchMedium_32K 2460752 2438611 -0.90% BenchmarkMatchEasy0_1M 422914 419672 -0.77% BenchmarkMatchMedium_1M 78581121 78040160 -0.69% BenchmarkMatchMedium_32M 2515287278 2498464906 -0.67% BenchmarkMatchMedium_32 1754 1746 -0.46% BenchmarkMatchMedium_1K 52105 52106 +0.00% BenchmarkAnchoredLiteralLongNonMatch 185 185 +0.00% BenchmarkMatchEasy1_32 107 107 +0.00% BenchmarkOnePassLongNotPrefix 505 505 +0.00% BenchmarkOnePassLongPrefix 147 147 +0.00% The godoc binary is ~0.12% smaller after this CL. Updates #5729. toolstash -cmp passes for all architectures other than amd64 and amd64p32. Other architectures can be done in follow-up CLs. Change-Id: I0e167e259274b722958567fc0af83a17ca002da7 Reviewed-on: https://go-review.googlesource.com/2284 Reviewed-by: Russ Cox <rsc@golang.org>

alexey-s-sidorov · 2016-02-26T14:40:54Z

Bug author's results:

bench and notice the difference between an int that gets xor with 1 and a bool that gets
flipped.

go test bench=".*"
BenchmarkBoolFlip 2000000000 1.93 ns/op
BenchmarkBoolFlipManual 2000000000 1.92 ns/op
BenchmarkBoolXor 2000000000 0.85 ns/op

My results:

go test bench=".*"
BenchmarkBoolFlip 2000000000 0.70 ns/op
BenchmarkBoolFlipManual 2000000000 1.01 ns/op
BenchmarkBoolXor 2000000000 0.71 ns/op"

there is no difference between an int that gets xor with 1 and a bool that gets
flipped.

Looks like dev.ssa recognizes bool flip.

randall77 · 2016-02-26T16:49:04Z

Note that your test code doesn't demonstrate what you think it does. What you're measuring is the fact that SSA notices that your bool is dead and removes the code for flag=!flag altogether.

But put in a use of the resulting bool, and you'll see that SSA does use ^1 for bool flip.

gopherbot added started Performance labels Aug 20, 2014

rsc added this to the Unplanned milestone Apr 10, 2015

rsc removed the release-none label Apr 10, 2015

rsc removed the repo-main label Apr 14, 2015

rsc changed the title ~~cmd/gc: compute booleans without jumps~~ cmd/compile: compute booleans without jumps Jun 8, 2015

bradfitz closed this as completed Feb 26, 2016

golang locked and limited conversation to collaborators Feb 28, 2017

gopherbot added the FrozenDueToAge label Feb 28, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile: compute booleans without jumps #5729

cmd/compile: compute booleans without jumps #5729

gopherbot commented Jun 18, 2013

gopherbot commented Jun 18, 2013

rsc commented Jun 18, 2013

gopherbot commented Jun 18, 2013

ianlancetaylor commented Jun 20, 2013

robpike commented Jun 21, 2013

rsc commented Jun 21, 2013

rsc commented Jul 30, 2013

rsc commented Jul 30, 2013

robpike commented Aug 29, 2013

rsc commented Nov 27, 2013

rsc commented Nov 27, 2013

rsc commented Dec 4, 2013

rsc commented Dec 4, 2013

josharian commented Aug 18, 2014

josharian commented Aug 20, 2014

alexey-s-sidorov commented Feb 26, 2016

randall77 commented Feb 26, 2016

cmd/compile: compute booleans without jumps #5729

cmd/compile: compute booleans without jumps #5729

Comments

gopherbot commented Jun 18, 2013

gopherbot commented Jun 18, 2013

rsc commented Jun 18, 2013

gopherbot commented Jun 18, 2013

ianlancetaylor commented Jun 20, 2013

robpike commented Jun 21, 2013

rsc commented Jun 21, 2013

rsc commented Jul 30, 2013

rsc commented Jul 30, 2013

robpike commented Aug 29, 2013

rsc commented Nov 27, 2013

rsc commented Nov 27, 2013

rsc commented Dec 4, 2013

rsc commented Dec 4, 2013

josharian commented Aug 18, 2014

josharian commented Aug 20, 2014

alexey-s-sidorov commented Feb 26, 2016

randall77 commented Feb 26, 2016