New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: compute booleans without jumps #5729
Comments
Comment 1 by martisch@uos.de: btw: i noticed this while programming a left leaning red black tree implementation similar to "http://code.google.com/p/biogo/source/browse/llrb.go?repo=llrb#103" that needs to flip link colors in a few places. for benchmarks of 1000000 inserts it seems to make a difference between 1510-1500 ns/op (boolean "!") and 1460-1450 ns/op (making the boolean an int and xor it) for an insert. node.linkcolor = !node.linkcolor vs node.linkcolor = 1 ^ node.linkcolor |
Comment 3 by martisch@uos.de: i didnt mean for FlipManual to be recognized for optimization either. i just included BenchmarkBoolFlipManual to show the relative difference of performance when reordering branches because it generates a better jump sequence than flag = !flag currently. |
The issue is that cgen of a bool value uses bgen (bool jump generator) to compute the value. For example in 6g the offending code is the call to bgen here: // these call bgen to get a bool value case OOROR: case OANDAND: case OEQ: case ONE: case OLT: case OLE: case OGE: case OGT: case ONOT: p1 = gbranch(AJMP, T, 0); p2 = pc; gmove(nodbool(1), res); p3 = gbranch(AJMP, T, 0); patch(p1, pc); bgen(n, 1, 0, p2); gmove(nodbool(0), res); patch(p3, pc); goto ret; The fix would be to introduce a copy of bgen called bvgen that computes a boolean instead of generating a jump, and then call bvgen here. The issue exists in the other compilers too. Labels changed: added go1.2maybe. |
See also issue #4397. |
Labels changed: removed priority-later. Owner changed to @josharian. Status changed to Started. |
WIP CL for 6g: https://golang.org/cl/129400044/. Kibitzing welcomed. |
Use SETcc instructions instead of Jcc to generate boolean values. This generates shorter, jump-free code, which may also enable other peephole optimizations. For example, given func f(i, j int) bool { return i == j } Before "".f t=1 size=32 value=0 args=0x18 locals=0x0 0x0000 00000 (p.go:3) TEXT "".f+0(SB),4,$0-24 0x0000 00000 (p.go:3) FUNCDATA $0,gclocals·f90cfd099b5ec2b453c391fece9d42bb+0(SB) 0x0000 00000 (p.go:3) FUNCDATA $1,gclocals·3280bececceccd33cb74587feedb1f9f+0(SB) 0x0000 00000 (p.go:4) MOVQ "".i+8(FP),BX 0x0005 00005 (p.go:4) MOVQ "".j+16(FP),BP 0x000a 00010 (p.go:4) CMPQ BX,BP 0x000d 00013 (p.go:4) JEQ ,21 0x000f 00015 (p.go:4) MOVB $0,"".~r2+24(FP) 0x0014 00020 (p.go:4) RET , 0x0015 00021 (p.go:4) MOVB $1,"".~r2+24(FP) 0x001a 00026 (p.go:4) JMP ,20 After "".f t=1 size=32 value=0 args=0x18 locals=0x0 0x0000 00000 (p.go:3) TEXT "".f+0(SB),4,$0-24 0x0000 00000 (p.go:3) FUNCDATA $0,gclocals·f90cfd099b5ec2b453c391fece9d42bb+0(SB) 0x0000 00000 (p.go:3) FUNCDATA $1,gclocals·3280bececceccd33cb74587feedb1f9f+0(SB) 0x0000 00000 (p.go:4) MOVQ "".i+8(FP),BX 0x0005 00005 (p.go:4) MOVQ "".j+16(FP),BP 0x000a 00010 (p.go:4) CMPQ BX,BP 0x000d 00013 (p.go:4) SETEQ ,"".~r2+24(FP) 0x0012 00018 (p.go:4) RET , Top movers in stdlib benchmarks: benchmark old ns/op new ns/op delta BenchmarkNextafter64 7.72 5.86 -24.09% BenchmarkSignbit 2.13 2.78 +30.52% BenchmarkNextafter32 7.98 6.17 -22.68% BenchmarkContendedSemaphore 72.0 86.1 +19.58% BenchmarkIndexByte32 6.94 8.28 +19.31% BenchmarkIndexRuneFastPath 27.3 22.9 -16.12% BenchmarkUnquoteEasy 190 217 +14.21% BenchmarkCompareBytesToNil 6.92 6.13 -11.42% BenchmarkComplex128DivNisNaN 11.4 10.1 -11.40% BenchmarkEqual16 7.18 6.38 -11.14% BenchmarkEqual9 7.18 6.38 -11.14% BenchmarkEqual15 7.18 6.41 -10.72% BenchmarkSearchWrappers 162 145 -10.49% BenchmarkEqual20 8.01 7.18 -10.36% BenchmarkCompareBytesBigIdentical 5.32 4.79 -9.96% BenchmarkInterfaceSmall 15.2 13.8 -9.21% BenchmarkEqual32 8.81 8.00 -9.19% BenchmarkTrimSpace 52.7 58.0 +10.06% BenchmarkCompareBytesEmpty 5.87 5.34 -9.03% BenchmarkMapStringKeysEight_64 21.8 23.9 +9.63% BenchmarkMapStringKeysEight_16 22.8 20.8 -8.77% BenchmarkMapStringKeysEight_1M 21.9 24.0 +9.59% BenchmarkMapStringKeysEight_32 21.9 24.0 +9.59% BenchmarkCompareBytesIdentical 5.85 5.34 -8.72% BenchmarkAcosh 32.2 35.2 +9.32% BenchmarkMin 3.20 2.93 -8.44% BenchmarkIPString 2120 1954 -7.83% BenchmarkChanSem 61.9 57.3 -7.43% BenchmarkCompareBytesDifferentLength 6.91 6.40 -7.38% BenchmarkCompareBytesEqual 6.91 6.40 -7.38% BenchmarkCompareBytesSameLength 6.92 6.41 -7.37% All regressions I've investigated are due to incidental code movement. The godoc binary is ~0.2% smaller after this CL. Updates golang#5729. Other architectures will be done in subsequent CLs. Change-Id: I0e167e259274b722958567fc0af83a17ca002da7
Use SETcc instructions instead of Jcc to generate boolean values. This generates shorter, jump-free code, which may in turn enable other peephole optimizations. For example, given func f(i, j int) bool { return i == j } Before "".f t=1 size=32 value=0 args=0x18 locals=0x0 0x0000 00000 (x.go:3) TEXT "".f(SB), $0-24 0x0000 00000 (x.go:3) FUNCDATA $0, gclocals·b4c25e9b09fd0cf9bb429dcefe91c353(SB) 0x0000 00000 (x.go:3) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:4) MOVQ "".i+8(FP), BX 0x0005 00005 (x.go:4) MOVQ "".j+16(FP), BP 0x000a 00010 (x.go:4) CMPQ BX, BP 0x000d 00013 (x.go:4) JEQ 21 0x000f 00015 (x.go:4) MOVB $0, "".~r2+24(FP) 0x0014 00020 (x.go:4) RET 0x0015 00021 (x.go:4) MOVB $1, "".~r2+24(FP) 0x001a 00026 (x.go:4) JMP 20 After "".f t=1 size=32 value=0 args=0x18 locals=0x0 0x0000 00000 (x.go:3) TEXT "".f(SB), $0-24 0x0000 00000 (x.go:3) FUNCDATA $0, gclocals·b4c25e9b09fd0cf9bb429dcefe91c353(SB) 0x0000 00000 (x.go:3) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:4) MOVQ "".i+8(FP), BX 0x0005 00005 (x.go:4) MOVQ "".j+16(FP), BP 0x000a 00010 (x.go:4) CMPQ BX, BP 0x000d 00013 (x.go:4) SETEQ "".~r2+24(FP) 0x0012 00018 (x.go:4) RET regexp benchmarks, best of 12 runs: benchmark old ns/op new ns/op delta BenchmarkNotOnePassShortB 782 733 -6.27% BenchmarkLiteral 180 171 -5.00% BenchmarkNotLiteral 2855 2721 -4.69% BenchmarkMatchHard_32 2672 2557 -4.30% BenchmarkMatchHard_1K 80182 76732 -4.30% BenchmarkMatchEasy1_32M 76440180 73304748 -4.10% BenchmarkMatchEasy1_32K 68798 66350 -3.56% BenchmarkAnchoredLongMatch 482 465 -3.53% BenchmarkMatchEasy1_1M 2373042 2292692 -3.39% BenchmarkReplaceAll 2776 2690 -3.10% BenchmarkNotOnePassShortA 1397 1360 -2.65% BenchmarkMatchClass_InRange 3842 3742 -2.60% BenchmarkMatchEasy0_32 125 122 -2.40% BenchmarkMatchEasy0_32K 11414 11164 -2.19% BenchmarkMatchEasy0_1K 668 654 -2.10% BenchmarkAnchoredShortMatch 260 255 -1.92% BenchmarkAnchoredLiteralShortNonMatch 164 161 -1.83% BenchmarkOnePassShortB 623 612 -1.77% BenchmarkOnePassShortA 801 788 -1.62% BenchmarkMatchClass 4094 4033 -1.49% BenchmarkMatchEasy0_32M 14078800 13890704 -1.34% BenchmarkMatchHard_32K 4095844 4045820 -1.22% BenchmarkMatchEasy1_1K 1663 1643 -1.20% BenchmarkMatchHard_1M 131261708 129708215 -1.18% BenchmarkMatchHard_32M 4210112412 4169292003 -0.97% BenchmarkMatchMedium_32K 2460752 2438611 -0.90% BenchmarkMatchEasy0_1M 422914 419672 -0.77% BenchmarkMatchMedium_1M 78581121 78040160 -0.69% BenchmarkMatchMedium_32M 2515287278 2498464906 -0.67% BenchmarkMatchMedium_32 1754 1746 -0.46% BenchmarkMatchMedium_1K 52105 52106 +0.00% BenchmarkAnchoredLiteralLongNonMatch 185 185 +0.00% BenchmarkMatchEasy1_32 107 107 +0.00% BenchmarkOnePassLongNotPrefix 505 505 +0.00% BenchmarkOnePassLongPrefix 147 147 +0.00% The godoc binary is ~0.12% smaller after this CL. Updates golang#5729. toolstash -cmp passes for all architectures other than amd64 and amd64p32. Other architectures can be done in follow-up CLs. Change-Id: I0e167e259274b722958567fc0af83a17ca002da7
Use SETcc instructions instead of Jcc to generate boolean values. This generates shorter, jump-free code, which may in turn enable other peephole optimizations. For example, given func f(i, j int) bool { return i == j } Before "".f t=1 size=32 value=0 args=0x18 locals=0x0 0x0000 00000 (x.go:3) TEXT "".f(SB), $0-24 0x0000 00000 (x.go:3) FUNCDATA $0, gclocals·b4c25e9b09fd0cf9bb429dcefe91c353(SB) 0x0000 00000 (x.go:3) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:4) MOVQ "".i+8(FP), BX 0x0005 00005 (x.go:4) MOVQ "".j+16(FP), BP 0x000a 00010 (x.go:4) CMPQ BX, BP 0x000d 00013 (x.go:4) JEQ 21 0x000f 00015 (x.go:4) MOVB $0, "".~r2+24(FP) 0x0014 00020 (x.go:4) RET 0x0015 00021 (x.go:4) MOVB $1, "".~r2+24(FP) 0x001a 00026 (x.go:4) JMP 20 After "".f t=1 size=32 value=0 args=0x18 locals=0x0 0x0000 00000 (x.go:3) TEXT "".f(SB), $0-24 0x0000 00000 (x.go:3) FUNCDATA $0, gclocals·b4c25e9b09fd0cf9bb429dcefe91c353(SB) 0x0000 00000 (x.go:3) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:4) MOVQ "".i+8(FP), BX 0x0005 00005 (x.go:4) MOVQ "".j+16(FP), BP 0x000a 00010 (x.go:4) CMPQ BX, BP 0x000d 00013 (x.go:4) SETEQ "".~r2+24(FP) 0x0012 00018 (x.go:4) RET regexp benchmarks, best of 12 runs: benchmark old ns/op new ns/op delta BenchmarkNotOnePassShortB 782 733 -6.27% BenchmarkLiteral 180 171 -5.00% BenchmarkNotLiteral 2855 2721 -4.69% BenchmarkMatchHard_32 2672 2557 -4.30% BenchmarkMatchHard_1K 80182 76732 -4.30% BenchmarkMatchEasy1_32M 76440180 73304748 -4.10% BenchmarkMatchEasy1_32K 68798 66350 -3.56% BenchmarkAnchoredLongMatch 482 465 -3.53% BenchmarkMatchEasy1_1M 2373042 2292692 -3.39% BenchmarkReplaceAll 2776 2690 -3.10% BenchmarkNotOnePassShortA 1397 1360 -2.65% BenchmarkMatchClass_InRange 3842 3742 -2.60% BenchmarkMatchEasy0_32 125 122 -2.40% BenchmarkMatchEasy0_32K 11414 11164 -2.19% BenchmarkMatchEasy0_1K 668 654 -2.10% BenchmarkAnchoredShortMatch 260 255 -1.92% BenchmarkAnchoredLiteralShortNonMatch 164 161 -1.83% BenchmarkOnePassShortB 623 612 -1.77% BenchmarkOnePassShortA 801 788 -1.62% BenchmarkMatchClass 4094 4033 -1.49% BenchmarkMatchEasy0_32M 14078800 13890704 -1.34% BenchmarkMatchHard_32K 4095844 4045820 -1.22% BenchmarkMatchEasy1_1K 1663 1643 -1.20% BenchmarkMatchHard_1M 131261708 129708215 -1.18% BenchmarkMatchHard_32M 4210112412 4169292003 -0.97% BenchmarkMatchMedium_32K 2460752 2438611 -0.90% BenchmarkMatchEasy0_1M 422914 419672 -0.77% BenchmarkMatchMedium_1M 78581121 78040160 -0.69% BenchmarkMatchMedium_32M 2515287278 2498464906 -0.67% BenchmarkMatchMedium_32 1754 1746 -0.46% BenchmarkMatchMedium_1K 52105 52106 +0.00% BenchmarkAnchoredLiteralLongNonMatch 185 185 +0.00% BenchmarkMatchEasy1_32 107 107 +0.00% BenchmarkOnePassLongNotPrefix 505 505 +0.00% BenchmarkOnePassLongPrefix 147 147 +0.00% The godoc binary is ~0.12% smaller after this CL. Updates golang#5729. toolstash -cmp passes for all architectures other than amd64 and amd64p32. Other architectures can be done in follow-up CLs. Change-Id: I0e167e259274b722958567fc0af83a17ca002da7
Use SETcc instructions instead of Jcc to generate boolean values. This generates shorter, jump-free code, which may in turn enable other peephole optimizations. For example, given func f(i, j int) bool { return i == j } Before "".f t=1 size=32 value=0 args=0x18 locals=0x0 0x0000 00000 (x.go:3) TEXT "".f(SB), $0-24 0x0000 00000 (x.go:3) FUNCDATA $0, gclocals·b4c25e9b09fd0cf9bb429dcefe91c353(SB) 0x0000 00000 (x.go:3) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:4) MOVQ "".i+8(FP), BX 0x0005 00005 (x.go:4) MOVQ "".j+16(FP), BP 0x000a 00010 (x.go:4) CMPQ BX, BP 0x000d 00013 (x.go:4) JEQ 21 0x000f 00015 (x.go:4) MOVB $0, "".~r2+24(FP) 0x0014 00020 (x.go:4) RET 0x0015 00021 (x.go:4) MOVB $1, "".~r2+24(FP) 0x001a 00026 (x.go:4) JMP 20 After "".f t=1 size=32 value=0 args=0x18 locals=0x0 0x0000 00000 (x.go:3) TEXT "".f(SB), $0-24 0x0000 00000 (x.go:3) FUNCDATA $0, gclocals·b4c25e9b09fd0cf9bb429dcefe91c353(SB) 0x0000 00000 (x.go:3) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:4) MOVQ "".i+8(FP), BX 0x0005 00005 (x.go:4) MOVQ "".j+16(FP), BP 0x000a 00010 (x.go:4) CMPQ BX, BP 0x000d 00013 (x.go:4) SETEQ "".~r2+24(FP) 0x0012 00018 (x.go:4) RET regexp benchmarks, best of 12 runs: benchmark old ns/op new ns/op delta BenchmarkNotOnePassShortB 782 733 -6.27% BenchmarkLiteral 180 171 -5.00% BenchmarkNotLiteral 2855 2721 -4.69% BenchmarkMatchHard_32 2672 2557 -4.30% BenchmarkMatchHard_1K 80182 76732 -4.30% BenchmarkMatchEasy1_32M 76440180 73304748 -4.10% BenchmarkMatchEasy1_32K 68798 66350 -3.56% BenchmarkAnchoredLongMatch 482 465 -3.53% BenchmarkMatchEasy1_1M 2373042 2292692 -3.39% BenchmarkReplaceAll 2776 2690 -3.10% BenchmarkNotOnePassShortA 1397 1360 -2.65% BenchmarkMatchClass_InRange 3842 3742 -2.60% BenchmarkMatchEasy0_32 125 122 -2.40% BenchmarkMatchEasy0_32K 11414 11164 -2.19% BenchmarkMatchEasy0_1K 668 654 -2.10% BenchmarkAnchoredShortMatch 260 255 -1.92% BenchmarkAnchoredLiteralShortNonMatch 164 161 -1.83% BenchmarkOnePassShortB 623 612 -1.77% BenchmarkOnePassShortA 801 788 -1.62% BenchmarkMatchClass 4094 4033 -1.49% BenchmarkMatchEasy0_32M 14078800 13890704 -1.34% BenchmarkMatchHard_32K 4095844 4045820 -1.22% BenchmarkMatchEasy1_1K 1663 1643 -1.20% BenchmarkMatchHard_1M 131261708 129708215 -1.18% BenchmarkMatchHard_32M 4210112412 4169292003 -0.97% BenchmarkMatchMedium_32K 2460752 2438611 -0.90% BenchmarkMatchEasy0_1M 422914 419672 -0.77% BenchmarkMatchMedium_1M 78581121 78040160 -0.69% BenchmarkMatchMedium_32M 2515287278 2498464906 -0.67% BenchmarkMatchMedium_32 1754 1746 -0.46% BenchmarkMatchMedium_1K 52105 52106 +0.00% BenchmarkAnchoredLiteralLongNonMatch 185 185 +0.00% BenchmarkMatchEasy1_32 107 107 +0.00% BenchmarkOnePassLongNotPrefix 505 505 +0.00% BenchmarkOnePassLongPrefix 147 147 +0.00% The godoc binary is ~0.12% smaller after this CL. Updates #5729. toolstash -cmp passes for all architectures other than amd64 and amd64p32. Other architectures can be done in follow-up CLs. Change-Id: I0e167e259274b722958567fc0af83a17ca002da7 Reviewed-on: https://go-review.googlesource.com/2284 Reviewed-by: Russ Cox <rsc@golang.org>
Bug author's results: bench and notice the difference between an int that gets xor with 1 and a bool that gets go test bench=".*" My results: go test bench=".*" there is no difference between an int that gets xor with 1 and a bool that gets Looks like dev.ssa recognizes bool flip. |
Note that your test code doesn't demonstrate what you think it does. What you're measuring is the fact that SSA notices that your bool is dead and removes the code for flag=!flag altogether. But put in a use of the resulting bool, and you'll see that SSA does use ^1 for bool flip. |
by martisch@uos.de:
The text was updated successfully, but these errors were encountered: