Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: compute booleans without jumps #5729

Closed
gopherbot opened this issue Jun 18, 2013 · 17 comments
Closed

cmd/compile: compute booleans without jumps #5729

gopherbot opened this issue Jun 18, 2013 · 17 comments

Comments

@gopherbot
Copy link

by martisch@uos.de:

What steps will reproduce the problem?

take:
http://play.golang.org/p/y2nj-Icl-j

bench and notice the difference between an int that gets xor with 1 and a bool that gets
flipped

go test bench=".*"
BenchmarkBoolFlip   2000000000           1.93 ns/op
BenchmarkBoolFlipManual 2000000000           1.92 ns/op
BenchmarkBoolXor    2000000000           0.85 ns/op

This might be only a very minor performance improvement as such but it also has the side
effect to not require and use resources of the branch predictor.

What is the expected output?

would think flag=!flag gets compiled to (gcc for C does this too):
0013 (boolxor.go:8) XORQ    $1,AX

What do you see instead?

0013 (boolxor.go:8) JMP     ,16
0014 (boolxor.go:8) MOVQ    $1,AX
0015 (boolxor.go:8) JMP     ,9
0016 (boolxor.go:8) CMPB    AX,$0
0017 (boolxor.go:8) JEQ     ,14
0018 (boolxor.go:8) MOVQ    $0,AX

if jumps need to be done: reordering and eliminating the first jump just makes a nearly
unnoticeable improvement of 1ns in the bench on average (see FlipManual). 
0034 (boolxor.go:15) CMPB    AX,$0
0035 (boolxor.go:15) JEQ     ,38
0036 (boolxor.go:16) MOVQ    $0,AX
0037 (boolxor.go:15) JMP     ,30
0038 (boolxor.go:18) MOVQ    $1,AX


Which compiler are you using (5g, 6g, 8g, gccgo)?
6g

Which operating system are you using?
Mac OS X 10.8.4
Darwin 12.4.0 Darwin Kernel Version 12.4.0: Wed May  1 17:57:12 PDT 2013;
root:xnu-2050.24.15~1/RELEASE_X86_64 x86_64

Which version are you using?  (run 'go version')
go version devel +27baf5978be6 Tue Jun 18 15:26:15 2013 +1000 darwin/amd64

Notes:
If you think such minor code generation issues (few jumps to much here and there,
boolean expression simplifications) are better left to be done faster by gccgo and not
be special cased by normal go compiler please let me know.
@gopherbot
Copy link
Author

Comment 1 by martisch@uos.de:

btw: i noticed this while programming a left leaning red black tree implementation
similar to "http://code.google.com/p/biogo/source/browse/llrb.go?repo=llrb#103" that
needs to flip link colors in a few places. for benchmarks of 1000000 inserts it seems to
make a difference between 1510-1500 ns/op (boolean "!") and 1460-1450 ns/op (making the
boolean an int and xor it) for an insert.
node.linkcolor = !node.linkcolor
vs
node.linkcolor = 1 ^ node.linkcolor

@rsc
Copy link
Contributor

rsc commented Jun 18, 2013

Comment 2:

I don't think it is worth recognizing
if flag {
   flag = false
} else {
   flag = true
}
I think it would be fine to recognize flag = !flag.
Russ

@gopherbot
Copy link
Author

Comment 3 by martisch@uos.de:

i didnt mean for FlipManual to be recognized for optimization either.
i just included BenchmarkBoolFlipManual to show the relative difference of performance
when reordering branches because it generates a better jump sequence than flag = !flag
currently.

@ianlancetaylor
Copy link
Contributor

Comment 4:

Labels changed: added priority-later, performance, removed priority-triage.

@robpike
Copy link
Contributor

robpike commented Jun 21, 2013

Comment 5:

Status changed to Accepted.

@rsc
Copy link
Contributor

rsc commented Jun 21, 2013

Comment 6:

The issue is that cgen of a bool value uses bgen (bool jump generator) to compute the
value. For example in 6g the offending code is the call to bgen here:
    // these call bgen to get a bool value
    case OOROR:
    case OANDAND:
    case OEQ:
    case ONE:
    case OLT:
    case OLE:
    case OGE:
    case OGT:
    case ONOT:
        p1 = gbranch(AJMP, T, 0);
        p2 = pc;
        gmove(nodbool(1), res);
        p3 = gbranch(AJMP, T, 0);
        patch(p1, pc);
        bgen(n, 1, 0, p2);
        gmove(nodbool(0), res);
        patch(p3, pc);
        goto ret;
The fix would be to introduce a copy of bgen called bvgen that computes a boolean
instead of generating a jump, and then call bvgen here. 
The issue exists in the other compilers too.

Labels changed: added go1.2maybe.

@rsc
Copy link
Contributor

rsc commented Jul 30, 2013

Comment 7:

See also issue #4397.

@rsc
Copy link
Contributor

rsc commented Jul 30, 2013

Comment 8:

Labels changed: added feature.

@robpike
Copy link
Contributor

robpike commented Aug 29, 2013

Comment 9:

Not for 1.2.

Labels changed: removed go1.2maybe.

@rsc
Copy link
Contributor

rsc commented Nov 27, 2013

Comment 10:

Labels changed: added go1.3maybe.

@rsc
Copy link
Contributor

rsc commented Nov 27, 2013

Comment 11:

Labels changed: removed feature.

@rsc
Copy link
Contributor

rsc commented Dec 4, 2013

Comment 12:

Labels changed: added release-none, removed go1.3maybe.

@rsc
Copy link
Contributor

rsc commented Dec 4, 2013

Comment 13:

Labels changed: added repo-main.

@josharian
Copy link
Contributor

Comment 14:

Labels changed: removed priority-later.

Owner changed to @josharian.

Status changed to Started.

@josharian
Copy link
Contributor

Comment 15:

WIP CL for 6g: https://golang.org/cl/129400044/. Kibitzing welcomed.

josharian added a commit to josharian/go that referenced this issue Jan 6, 2015
Use SETcc instructions instead of Jcc to generate boolean values.
This generates shorter, jump-free code, which may also enable other
peephole optimizations.

For example, given

func f(i, j int) bool {
	return i == j
}

Before

"".f t=1 size=32 value=0 args=0x18 locals=0x0
	0x0000 00000 (p.go:3)	TEXT	"".f+0(SB),4,$0-24
	0x0000 00000 (p.go:3)	FUNCDATA	$0,gclocals·f90cfd099b5ec2b453c391fece9d42bb+0(SB)
	0x0000 00000 (p.go:3)	FUNCDATA	$1,gclocals·3280bececceccd33cb74587feedb1f9f+0(SB)
	0x0000 00000 (p.go:4)	MOVQ	"".i+8(FP),BX
	0x0005 00005 (p.go:4)	MOVQ	"".j+16(FP),BP
	0x000a 00010 (p.go:4)	CMPQ	BX,BP
	0x000d 00013 (p.go:4)	JEQ	,21
	0x000f 00015 (p.go:4)	MOVB	$0,"".~r2+24(FP)
	0x0014 00020 (p.go:4)	RET	,
	0x0015 00021 (p.go:4)	MOVB	$1,"".~r2+24(FP)
	0x001a 00026 (p.go:4)	JMP	,20

After

"".f t=1 size=32 value=0 args=0x18 locals=0x0
	0x0000 00000 (p.go:3)	TEXT	"".f+0(SB),4,$0-24
	0x0000 00000 (p.go:3)	FUNCDATA	$0,gclocals·f90cfd099b5ec2b453c391fece9d42bb+0(SB)
	0x0000 00000 (p.go:3)	FUNCDATA	$1,gclocals·3280bececceccd33cb74587feedb1f9f+0(SB)
	0x0000 00000 (p.go:4)	MOVQ	"".i+8(FP),BX
	0x0005 00005 (p.go:4)	MOVQ	"".j+16(FP),BP
	0x000a 00010 (p.go:4)	CMPQ	BX,BP
	0x000d 00013 (p.go:4)	SETEQ	,"".~r2+24(FP)
	0x0012 00018 (p.go:4)	RET	,

Top movers in stdlib benchmarks:

benchmark                                    old ns/op      new ns/op      delta
BenchmarkNextafter64                         7.72           5.86           -24.09%
BenchmarkSignbit                             2.13           2.78           +30.52%
BenchmarkNextafter32                         7.98           6.17           -22.68%
BenchmarkContendedSemaphore                  72.0           86.1           +19.58%
BenchmarkIndexByte32                         6.94           8.28           +19.31%
BenchmarkIndexRuneFastPath                   27.3           22.9           -16.12%
BenchmarkUnquoteEasy                         190            217            +14.21%
BenchmarkCompareBytesToNil                   6.92           6.13           -11.42%
BenchmarkComplex128DivNisNaN                 11.4           10.1           -11.40%
BenchmarkEqual16                             7.18           6.38           -11.14%
BenchmarkEqual9                              7.18           6.38           -11.14%
BenchmarkEqual15                             7.18           6.41           -10.72%
BenchmarkSearchWrappers                      162            145            -10.49%
BenchmarkEqual20                             8.01           7.18           -10.36%
BenchmarkCompareBytesBigIdentical            5.32           4.79           -9.96%
BenchmarkInterfaceSmall                      15.2           13.8           -9.21%
BenchmarkEqual32                             8.81           8.00           -9.19%
BenchmarkTrimSpace                           52.7           58.0           +10.06%
BenchmarkCompareBytesEmpty                   5.87           5.34           -9.03%
BenchmarkMapStringKeysEight_64               21.8           23.9           +9.63%
BenchmarkMapStringKeysEight_16               22.8           20.8           -8.77%
BenchmarkMapStringKeysEight_1M               21.9           24.0           +9.59%
BenchmarkMapStringKeysEight_32               21.9           24.0           +9.59%
BenchmarkCompareBytesIdentical               5.85           5.34           -8.72%
BenchmarkAcosh                               32.2           35.2           +9.32%
BenchmarkMin                                 3.20           2.93           -8.44%
BenchmarkIPString                            2120           1954           -7.83%
BenchmarkChanSem                             61.9           57.3           -7.43%
BenchmarkCompareBytesDifferentLength         6.91           6.40           -7.38%
BenchmarkCompareBytesEqual                   6.91           6.40           -7.38%
BenchmarkCompareBytesSameLength              6.92           6.41           -7.37%

All regressions I've investigated are due to incidental code movement.

The godoc binary is ~0.2% smaller after this CL.

Updates golang#5729. Other architectures will be done in subsequent CLs.

Change-Id: I0e167e259274b722958567fc0af83a17ca002da7
@rsc rsc added this to the Unplanned milestone Apr 10, 2015
@rsc rsc removed the release-none label Apr 10, 2015
josharian added a commit to josharian/go that referenced this issue Apr 10, 2015
Use SETcc instructions instead of Jcc to generate boolean values.
This generates shorter, jump-free code, which may in turn enable other
peephole optimizations.

For example, given

func f(i, j int) bool {
	return i == j
}

Before

"".f t=1 size=32 value=0 args=0x18 locals=0x0
	0x0000 00000 (x.go:3)	TEXT	"".f(SB), $0-24
	0x0000 00000 (x.go:3)	FUNCDATA	$0, gclocals·b4c25e9b09fd0cf9bb429dcefe91c353(SB)
	0x0000 00000 (x.go:3)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:4)	MOVQ	"".i+8(FP), BX
	0x0005 00005 (x.go:4)	MOVQ	"".j+16(FP), BP
	0x000a 00010 (x.go:4)	CMPQ	BX, BP
	0x000d 00013 (x.go:4)	JEQ	21
	0x000f 00015 (x.go:4)	MOVB	$0, "".~r2+24(FP)
	0x0014 00020 (x.go:4)	RET
	0x0015 00021 (x.go:4)	MOVB	$1, "".~r2+24(FP)
	0x001a 00026 (x.go:4)	JMP	20

After

"".f t=1 size=32 value=0 args=0x18 locals=0x0
	0x0000 00000 (x.go:3)	TEXT	"".f(SB), $0-24
	0x0000 00000 (x.go:3)	FUNCDATA	$0, gclocals·b4c25e9b09fd0cf9bb429dcefe91c353(SB)
	0x0000 00000 (x.go:3)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:4)	MOVQ	"".i+8(FP), BX
	0x0005 00005 (x.go:4)	MOVQ	"".j+16(FP), BP
	0x000a 00010 (x.go:4)	CMPQ	BX, BP
	0x000d 00013 (x.go:4)	SETEQ	"".~r2+24(FP)
	0x0012 00018 (x.go:4)	RET

regexp benchmarks, best of 12 runs:

benchmark                                 old ns/op      new ns/op      delta
BenchmarkNotOnePassShortB                 782            733            -6.27%
BenchmarkLiteral                          180            171            -5.00%
BenchmarkNotLiteral                       2855           2721           -4.69%
BenchmarkMatchHard_32                     2672           2557           -4.30%
BenchmarkMatchHard_1K                     80182          76732          -4.30%
BenchmarkMatchEasy1_32M                   76440180       73304748       -4.10%
BenchmarkMatchEasy1_32K                   68798          66350          -3.56%
BenchmarkAnchoredLongMatch                482            465            -3.53%
BenchmarkMatchEasy1_1M                    2373042        2292692        -3.39%
BenchmarkReplaceAll                       2776           2690           -3.10%
BenchmarkNotOnePassShortA                 1397           1360           -2.65%
BenchmarkMatchClass_InRange               3842           3742           -2.60%
BenchmarkMatchEasy0_32                    125            122            -2.40%
BenchmarkMatchEasy0_32K                   11414          11164          -2.19%
BenchmarkMatchEasy0_1K                    668            654            -2.10%
BenchmarkAnchoredShortMatch               260            255            -1.92%
BenchmarkAnchoredLiteralShortNonMatch     164            161            -1.83%
BenchmarkOnePassShortB                    623            612            -1.77%
BenchmarkOnePassShortA                    801            788            -1.62%
BenchmarkMatchClass                       4094           4033           -1.49%
BenchmarkMatchEasy0_32M                   14078800       13890704       -1.34%
BenchmarkMatchHard_32K                    4095844        4045820        -1.22%
BenchmarkMatchEasy1_1K                    1663           1643           -1.20%
BenchmarkMatchHard_1M                     131261708      129708215      -1.18%
BenchmarkMatchHard_32M                    4210112412     4169292003     -0.97%
BenchmarkMatchMedium_32K                  2460752        2438611        -0.90%
BenchmarkMatchEasy0_1M                    422914         419672         -0.77%
BenchmarkMatchMedium_1M                   78581121       78040160       -0.69%
BenchmarkMatchMedium_32M                  2515287278     2498464906     -0.67%
BenchmarkMatchMedium_32                   1754           1746           -0.46%
BenchmarkMatchMedium_1K                   52105          52106          +0.00%
BenchmarkAnchoredLiteralLongNonMatch      185            185            +0.00%
BenchmarkMatchEasy1_32                    107            107            +0.00%
BenchmarkOnePassLongNotPrefix             505            505            +0.00%
BenchmarkOnePassLongPrefix                147            147            +0.00%

The godoc binary is ~0.12% smaller after this CL.

Updates golang#5729.

toolstash -cmp passes for all architectures other than amd64 and amd64p32.

Other architectures can be done in follow-up CLs.

Change-Id: I0e167e259274b722958567fc0af83a17ca002da7
josharian added a commit to josharian/go that referenced this issue Apr 14, 2015
Use SETcc instructions instead of Jcc to generate boolean values.
This generates shorter, jump-free code, which may in turn enable other
peephole optimizations.

For example, given

func f(i, j int) bool {
	return i == j
}

Before

"".f t=1 size=32 value=0 args=0x18 locals=0x0
	0x0000 00000 (x.go:3)	TEXT	"".f(SB), $0-24
	0x0000 00000 (x.go:3)	FUNCDATA	$0, gclocals·b4c25e9b09fd0cf9bb429dcefe91c353(SB)
	0x0000 00000 (x.go:3)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:4)	MOVQ	"".i+8(FP), BX
	0x0005 00005 (x.go:4)	MOVQ	"".j+16(FP), BP
	0x000a 00010 (x.go:4)	CMPQ	BX, BP
	0x000d 00013 (x.go:4)	JEQ	21
	0x000f 00015 (x.go:4)	MOVB	$0, "".~r2+24(FP)
	0x0014 00020 (x.go:4)	RET
	0x0015 00021 (x.go:4)	MOVB	$1, "".~r2+24(FP)
	0x001a 00026 (x.go:4)	JMP	20

After

"".f t=1 size=32 value=0 args=0x18 locals=0x0
	0x0000 00000 (x.go:3)	TEXT	"".f(SB), $0-24
	0x0000 00000 (x.go:3)	FUNCDATA	$0, gclocals·b4c25e9b09fd0cf9bb429dcefe91c353(SB)
	0x0000 00000 (x.go:3)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:4)	MOVQ	"".i+8(FP), BX
	0x0005 00005 (x.go:4)	MOVQ	"".j+16(FP), BP
	0x000a 00010 (x.go:4)	CMPQ	BX, BP
	0x000d 00013 (x.go:4)	SETEQ	"".~r2+24(FP)
	0x0012 00018 (x.go:4)	RET

regexp benchmarks, best of 12 runs:

benchmark                                 old ns/op      new ns/op      delta
BenchmarkNotOnePassShortB                 782            733            -6.27%
BenchmarkLiteral                          180            171            -5.00%
BenchmarkNotLiteral                       2855           2721           -4.69%
BenchmarkMatchHard_32                     2672           2557           -4.30%
BenchmarkMatchHard_1K                     80182          76732          -4.30%
BenchmarkMatchEasy1_32M                   76440180       73304748       -4.10%
BenchmarkMatchEasy1_32K                   68798          66350          -3.56%
BenchmarkAnchoredLongMatch                482            465            -3.53%
BenchmarkMatchEasy1_1M                    2373042        2292692        -3.39%
BenchmarkReplaceAll                       2776           2690           -3.10%
BenchmarkNotOnePassShortA                 1397           1360           -2.65%
BenchmarkMatchClass_InRange               3842           3742           -2.60%
BenchmarkMatchEasy0_32                    125            122            -2.40%
BenchmarkMatchEasy0_32K                   11414          11164          -2.19%
BenchmarkMatchEasy0_1K                    668            654            -2.10%
BenchmarkAnchoredShortMatch               260            255            -1.92%
BenchmarkAnchoredLiteralShortNonMatch     164            161            -1.83%
BenchmarkOnePassShortB                    623            612            -1.77%
BenchmarkOnePassShortA                    801            788            -1.62%
BenchmarkMatchClass                       4094           4033           -1.49%
BenchmarkMatchEasy0_32M                   14078800       13890704       -1.34%
BenchmarkMatchHard_32K                    4095844        4045820        -1.22%
BenchmarkMatchEasy1_1K                    1663           1643           -1.20%
BenchmarkMatchHard_1M                     131261708      129708215      -1.18%
BenchmarkMatchHard_32M                    4210112412     4169292003     -0.97%
BenchmarkMatchMedium_32K                  2460752        2438611        -0.90%
BenchmarkMatchEasy0_1M                    422914         419672         -0.77%
BenchmarkMatchMedium_1M                   78581121       78040160       -0.69%
BenchmarkMatchMedium_32M                  2515287278     2498464906     -0.67%
BenchmarkMatchMedium_32                   1754           1746           -0.46%
BenchmarkMatchMedium_1K                   52105          52106          +0.00%
BenchmarkAnchoredLiteralLongNonMatch      185            185            +0.00%
BenchmarkMatchEasy1_32                    107            107            +0.00%
BenchmarkOnePassLongNotPrefix             505            505            +0.00%
BenchmarkOnePassLongPrefix                147            147            +0.00%

The godoc binary is ~0.12% smaller after this CL.

Updates golang#5729.

toolstash -cmp passes for all architectures other than amd64 and amd64p32.

Other architectures can be done in follow-up CLs.

Change-Id: I0e167e259274b722958567fc0af83a17ca002da7
@rsc rsc removed the repo-main label Apr 14, 2015
josharian added a commit that referenced this issue Apr 17, 2015
Use SETcc instructions instead of Jcc to generate boolean values.
This generates shorter, jump-free code, which may in turn enable other
peephole optimizations.

For example, given

func f(i, j int) bool {
	return i == j
}

Before

"".f t=1 size=32 value=0 args=0x18 locals=0x0
	0x0000 00000 (x.go:3)	TEXT	"".f(SB), $0-24
	0x0000 00000 (x.go:3)	FUNCDATA	$0, gclocals·b4c25e9b09fd0cf9bb429dcefe91c353(SB)
	0x0000 00000 (x.go:3)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:4)	MOVQ	"".i+8(FP), BX
	0x0005 00005 (x.go:4)	MOVQ	"".j+16(FP), BP
	0x000a 00010 (x.go:4)	CMPQ	BX, BP
	0x000d 00013 (x.go:4)	JEQ	21
	0x000f 00015 (x.go:4)	MOVB	$0, "".~r2+24(FP)
	0x0014 00020 (x.go:4)	RET
	0x0015 00021 (x.go:4)	MOVB	$1, "".~r2+24(FP)
	0x001a 00026 (x.go:4)	JMP	20

After

"".f t=1 size=32 value=0 args=0x18 locals=0x0
	0x0000 00000 (x.go:3)	TEXT	"".f(SB), $0-24
	0x0000 00000 (x.go:3)	FUNCDATA	$0, gclocals·b4c25e9b09fd0cf9bb429dcefe91c353(SB)
	0x0000 00000 (x.go:3)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:4)	MOVQ	"".i+8(FP), BX
	0x0005 00005 (x.go:4)	MOVQ	"".j+16(FP), BP
	0x000a 00010 (x.go:4)	CMPQ	BX, BP
	0x000d 00013 (x.go:4)	SETEQ	"".~r2+24(FP)
	0x0012 00018 (x.go:4)	RET

regexp benchmarks, best of 12 runs:

benchmark                                 old ns/op      new ns/op      delta
BenchmarkNotOnePassShortB                 782            733            -6.27%
BenchmarkLiteral                          180            171            -5.00%
BenchmarkNotLiteral                       2855           2721           -4.69%
BenchmarkMatchHard_32                     2672           2557           -4.30%
BenchmarkMatchHard_1K                     80182          76732          -4.30%
BenchmarkMatchEasy1_32M                   76440180       73304748       -4.10%
BenchmarkMatchEasy1_32K                   68798          66350          -3.56%
BenchmarkAnchoredLongMatch                482            465            -3.53%
BenchmarkMatchEasy1_1M                    2373042        2292692        -3.39%
BenchmarkReplaceAll                       2776           2690           -3.10%
BenchmarkNotOnePassShortA                 1397           1360           -2.65%
BenchmarkMatchClass_InRange               3842           3742           -2.60%
BenchmarkMatchEasy0_32                    125            122            -2.40%
BenchmarkMatchEasy0_32K                   11414          11164          -2.19%
BenchmarkMatchEasy0_1K                    668            654            -2.10%
BenchmarkAnchoredShortMatch               260            255            -1.92%
BenchmarkAnchoredLiteralShortNonMatch     164            161            -1.83%
BenchmarkOnePassShortB                    623            612            -1.77%
BenchmarkOnePassShortA                    801            788            -1.62%
BenchmarkMatchClass                       4094           4033           -1.49%
BenchmarkMatchEasy0_32M                   14078800       13890704       -1.34%
BenchmarkMatchHard_32K                    4095844        4045820        -1.22%
BenchmarkMatchEasy1_1K                    1663           1643           -1.20%
BenchmarkMatchHard_1M                     131261708      129708215      -1.18%
BenchmarkMatchHard_32M                    4210112412     4169292003     -0.97%
BenchmarkMatchMedium_32K                  2460752        2438611        -0.90%
BenchmarkMatchEasy0_1M                    422914         419672         -0.77%
BenchmarkMatchMedium_1M                   78581121       78040160       -0.69%
BenchmarkMatchMedium_32M                  2515287278     2498464906     -0.67%
BenchmarkMatchMedium_32                   1754           1746           -0.46%
BenchmarkMatchMedium_1K                   52105          52106          +0.00%
BenchmarkAnchoredLiteralLongNonMatch      185            185            +0.00%
BenchmarkMatchEasy1_32                    107            107            +0.00%
BenchmarkOnePassLongNotPrefix             505            505            +0.00%
BenchmarkOnePassLongPrefix                147            147            +0.00%

The godoc binary is ~0.12% smaller after this CL.

Updates #5729.

toolstash -cmp passes for all architectures other than amd64 and amd64p32.

Other architectures can be done in follow-up CLs.

Change-Id: I0e167e259274b722958567fc0af83a17ca002da7
Reviewed-on: https://go-review.googlesource.com/2284
Reviewed-by: Russ Cox <rsc@golang.org>
@rsc rsc changed the title cmd/gc: compute booleans without jumps cmd/compile: compute booleans without jumps Jun 8, 2015
@alexey-s-sidorov
Copy link

Bug author's results:

bench and notice the difference between an int that gets xor with 1 and a bool that gets
flipped.

go test bench=".*"
BenchmarkBoolFlip 2000000000 1.93 ns/op
BenchmarkBoolFlipManual 2000000000 1.92 ns/op
BenchmarkBoolXor 2000000000 0.85 ns/op

My results:

go test bench=".*"
BenchmarkBoolFlip 2000000000 0.70 ns/op
BenchmarkBoolFlipManual 2000000000 1.01 ns/op
BenchmarkBoolXor 2000000000 0.71 ns/op"

there is no difference between an int that gets xor with 1 and a bool that gets
flipped.

Looks like dev.ssa recognizes bool flip.

@randall77
Copy link
Contributor

Note that your test code doesn't demonstrate what you think it does. What you're measuring is the fact that SSA notices that your bool is dead and removes the code for flag=!flag altogether.

But put in a use of the resulting bool, and you'll see that SSA does use ^1 for bool flip.

@golang golang locked and limited conversation to collaborators Feb 28, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants