Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: silly amounts of code generated for simple struct equality tests #15303

Closed
josharian opened this issue Apr 14, 2016 · 6 comments
Closed

Comments

@josharian
Copy link
Contributor

[partial move from #15164]

package x

type T struct {
    a, b int8
}

func g(a T) bool {
    return a == T{1, 2}
}

With ssa, this generates the following code:

"".g t=1 size=80 args=0x10 locals=0x8
    0x0000 00000 (swtopt.go:23) TEXT    "".g(SB), $8-16
    0x0000 00000 (swtopt.go:23) SUBQ    $8, SP
    0x0004 00004 (swtopt.go:23) FUNCDATA    $0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB)
    0x0004 00004 (swtopt.go:23) FUNCDATA    $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
    0x0004 00004 (swtopt.go:24) MOVBLZX "".a+16(FP), AX
    0x0009 00009 (swtopt.go:24) MOVB    AL, "".autotmp_16+6(SP)
    0x000d 00013 (swtopt.go:24) MOVBLZX "".a+17(FP), AX
    0x0012 00018 (swtopt.go:24) MOVB    AL, "".autotmp_16+7(SP)
    0x0016 00022 (swtopt.go:24) MOVB    $0, "".autotmp_17+4(SP)
    0x001b 00027 (swtopt.go:24) MOVB    $1, "".autotmp_17+4(SP)
    0x0020 00032 (swtopt.go:24) MOVB    $2, "".autotmp_17+5(SP)
    0x0025 00037 (swtopt.go:24) MOVBLZX "".autotmp_16+6(SP), AX
    0x002a 00042 (swtopt.go:24) MOVBLZX "".autotmp_17+4(SP), CX
    0x002f 00047 (swtopt.go:24) CMPB    AL, CL
    0x0031 00049 (swtopt.go:24) JNE 70
    0x0033 00051 (swtopt.go:24) MOVBLZX "".autotmp_16+7(SP), AX
    0x0038 00056 (swtopt.go:24) CMPB    AL, $2
    0x003a 00058 (swtopt.go:24) SETEQ   AL
    0x003d 00061 (swtopt.go:24) MOVB    AL, "".~r1+24(FP)
    0x0041 00065 (swtopt.go:24) ADDQ    $8, SP
    0x0045 00069 (swtopt.go:24) RET
    0x0046 00070 (swtopt.go:24) MOVB    $0, AL
    0x0048 00072 (swtopt.go:24) JMP 61

This is better than the old backend, but still way too much. Some of the fat is due to #15300, but there's still an unnecessary zero at 0x0016 and unnecessary spills.

The unnecessary zero is introduced during the decompose user phase.

The unnecessary spills are probably due to how the front end does small struct equality, but I haven't double-checked.

cc @mdempsky @brtzsnr @randall77

@josharian
Copy link
Contributor Author

CL 22277 mostly addresses this, but it doesn't explain or fix the unnecessary zeroing introduced during the decompose phase, which I suspect could still happen via another code path. For example, it also used to happen when appending arrays/structs prior to CL 22248. Maybe @brtzsnr would be interested in looking into that?

@gopherbot
Copy link

CL https://golang.org/cl/22277 mentions this issue.

@josharian
Copy link
Contributor Author

CL 22277 is on hold until SSA is the default everywhere. In the meantime, hopefully @brtzsnr will succeed in making SSA do a better job with what we've got.

@brtzsnr
Copy link
Contributor

brtzsnr commented Apr 22, 2016

Not sure I can do anything simple here. I looked into a == T{1, 2} example and the tradeoff seems to be between 1 cmp + 1 jmp + 1/2 cmp and 2 cmp + 1 and8. If the cmp (ints) is cheap we can do the latter. Otherwise (for strings) the former is faster.

Maybe one improvement is to order the comparisons of fields by cost. First ints, then string/slice length, then small arrays, etc.

@josharian
Copy link
Contributor Author

Thanks for checking into it, @brtzsnr, much appreciated.

josharian added a commit to josharian/go that referenced this issue May 9, 2016
DO NOT REVIEW

[There is no clear way to improve things for all backends.
So this CL is going to wait until we are SSA by default
everywhere.]

This CL reworks walkcompare for clarity and concision.
It also makes one significant functional change.
(The functional change is hard to separate cleanly
from the cleanup, so I just did them together.)
When inlining and unrolling an equality comparison
for a small struct or array, compare the elements like:

a[0] == b[0] && a[1] == b[1]

rather than

pa := &a
pb := &b
pa[0] == pb[0] && pa[1] == pb[1]

The result is the same, but taking the address
and working through the indirect
forces the backends to generate less efficient code.

Updates golang#15303


Sample code:

type T struct {
	a, b int8
}

func g(a T) bool {
	return a == T{1, 2}
}


SSA before:

"".g t=1 size=80 args=0x10 locals=0x8
	0x0000 00000 (badeq.go:7)	TEXT	"".g(SB), $8-16
	0x0000 00000 (badeq.go:7)	SUBQ	$8, SP
	0x0004 00004 (badeq.go:7)	FUNCDATA	$0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB)
	0x0004 00004 (badeq.go:7)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0004 00004 (badeq.go:8)	MOVBLZX	"".a+16(FP), AX
	0x0009 00009 (badeq.go:8)	MOVB	AL, "".autotmp_0+6(SP)
	0x000d 00013 (badeq.go:8)	MOVBLZX	"".a+17(FP), AX
	0x0012 00018 (badeq.go:8)	MOVB	AL, "".autotmp_0+7(SP)
	0x0016 00022 (badeq.go:8)	MOVB	$0, "".autotmp_1+4(SP)
	0x001b 00027 (badeq.go:8)	MOVB	$1, "".autotmp_1+4(SP)
	0x0020 00032 (badeq.go:8)	MOVB	$2, "".autotmp_1+5(SP)
	0x0025 00037 (badeq.go:8)	MOVBLZX	"".autotmp_0+6(SP), AX
	0x002a 00042 (badeq.go:8)	MOVBLZX	"".autotmp_1+4(SP), CX
	0x002f 00047 (badeq.go:8)	CMPB	AL, CL
	0x0031 00049 (badeq.go:8)	JNE	70
	0x0033 00051 (badeq.go:8)	MOVBLZX	"".autotmp_0+7(SP), AX
	0x0038 00056 (badeq.go:8)	CMPB	AL, $2
	0x003a 00058 (badeq.go:8)	SETEQ	AL
	0x003d 00061 (badeq.go:8)	MOVB	AL, "".~r1+24(FP)
	0x0041 00065 (badeq.go:8)	ADDQ	$8, SP
	0x0045 00069 (badeq.go:8)	RET
	0x0046 00070 (badeq.go:8)	MOVB	$0, AL
	0x0048 00072 (badeq.go:8)	JMP	61

SSA after:

"".g t=1 size=32 args=0x10 locals=0x0
	0x0000 00000 (badeq.go:7)	TEXT	"".g(SB), $0-16
	0x0000 00000 (badeq.go:7)	NOP
	0x0000 00000 (badeq.go:7)	NOP
	0x0000 00000 (badeq.go:7)	FUNCDATA	$0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB)
	0x0000 00000 (badeq.go:7)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (badeq.go:8)	MOVBLZX	"".a+8(FP), AX
	0x0005 00005 (badeq.go:8)	CMPB	AL, $1
	0x0007 00007 (badeq.go:8)	JNE	25
	0x0009 00009 (badeq.go:8)	MOVBLZX	"".a+9(FP), CX
	0x000e 00014 (badeq.go:8)	CMPB	CL, $2
	0x0011 00017 (badeq.go:8)	SETEQ	AL
	0x0014 00020 (badeq.go:8)	MOVB	AL, "".~r1+16(FP)
	0x0018 00024 (badeq.go:8)	RET
	0x0019 00025 (badeq.go:8)	MOVB	$0, AL
	0x001b 00027 (badeq.go:8)	JMP	20

Old backend before:

"".g t=1 size=96 args=0x10 locals=0x8
	0x0000 00000 (badeq.go:7)	TEXT	"".g(SB), $8-16
	0x0000 00000 (badeq.go:7)	SUBQ	$8, SP
	0x0004 00004 (badeq.go:7)	FUNCDATA	$0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB)
	0x0004 00004 (badeq.go:7)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0004 00004 (badeq.go:8)	MOVBQZX	"".a+16(FP), BX
	0x0009 00009 (badeq.go:8)	MOVB	BL, "".autotmp_0+6(SP)
	0x000d 00013 (badeq.go:8)	MOVBQZX	"".a+17(FP), BX
	0x0012 00018 (badeq.go:8)	MOVB	BL, "".autotmp_0+7(SP)
	0x0016 00022 (badeq.go:8)	MOVQ	$0, CX
	0x0018 00024 (badeq.go:8)	MOVB	CL, "".autotmp_1+4(SP)
	0x001c 00028 (badeq.go:8)	MOVB	CL, "".autotmp_1+5(SP)
	0x0020 00032 (badeq.go:8)	MOVB	$1, "".autotmp_1+4(SP)
	0x0025 00037 (badeq.go:8)	MOVB	$2, "".autotmp_1+5(SP)
	0x002a 00042 (badeq.go:8)	LEAQ	"".autotmp_0+6(SP), CX
	0x002f 00047 (badeq.go:8)	LEAQ	"".autotmp_1+4(SP), BX
	0x0034 00052 (badeq.go:8)	MOVQ	BX, AX
	0x0037 00055 (badeq.go:8)	NOP
	0x0037 00055 (badeq.go:8)	MOVBQZX	(CX), BX
	0x003a 00058 (badeq.go:8)	NOP
	0x003a 00058 (badeq.go:8)	MOVBQZX	(AX), BP
	0x003d 00061 (badeq.go:8)	CMPB	BL, BPB
	0x0040 00064 (badeq.go:8)	JNE	87
	0x0042 00066 (badeq.go:8)	NOP
	0x0042 00066 (badeq.go:8)	MOVBQZX	1(CX), BX
	0x0046 00070 (badeq.go:8)	NOP
	0x0046 00070 (badeq.go:8)	MOVBQZX	1(AX), BP
	0x004a 00074 (badeq.go:8)	CMPB	BL, BPB
	0x004d 00077 (badeq.go:8)	SETEQ	"".~r1+24(FP)
	0x0052 00082 (badeq.go:8)	ADDQ	$8, SP
	0x0056 00086 (badeq.go:8)	RET
	0x0057 00087 (badeq.go:8)	MOVB	$0, "".~r1+24(FP)
	0x005c 00092 (badeq.go:8)	JMP	82

Old backend after:

"".g t=1 size=64 args=0x10 locals=0x0
	0x0000 00000 (badeq.go:7)	TEXT	"".g(SB), $0-16
	0x0000 00000 (badeq.go:7)	NOP
	0x0000 00000 (badeq.go:7)	NOP
	0x0000 00000 (badeq.go:7)	FUNCDATA	$0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB)
	0x0000 00000 (badeq.go:7)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (badeq.go:8)	NOP
	0x0000 00000 (badeq.go:8)	MOVBQZX	"".a+8(FP), BX
	0x0005 00005 (badeq.go:8)	MOVQ	BX, BP
	0x0008 00008 (badeq.go:8)	MOVBQZX	"".a+9(FP), BX
	0x000d 00013 (badeq.go:8)	MOVQ	BX, DX
	0x0010 00016 (badeq.go:8)	NOP
	0x0010 00016 (badeq.go:8)	MOVQ	$0, BX
	0x0012 00018 (badeq.go:8)	NOP
	0x0012 00018 (badeq.go:8)	MOVQ	$1, CX
	0x0019 00025 (badeq.go:8)	MOVQ	$2, AX
	0x0020 00032 (badeq.go:8)	CMPB	BPB, CL
	0x0023 00035 (badeq.go:8)	JNE	45
	0x0025 00037 (badeq.go:8)	CMPB	DL, AL
	0x0027 00039 (badeq.go:8)	SETEQ	"".~r1+16(FP)
	0x002c 00044 (badeq.go:8)	RET
	0x002d 00045 (badeq.go:8)	MOVB	$0, "".~r1+16(FP)
	0x0032 00050 (badeq.go:8)	JMP	44


Change-Id: I120185d58012b7bbcdb1ec01225b5b08d0855d86
josharian added a commit to josharian/go that referenced this issue May 11, 2016
DO NOT REVIEW

[There is no clear way to improve things for all backends.
So this CL is going to wait until we are SSA by default
everywhere.]

This CL reworks walkcompare for clarity and concision.
It also makes one significant functional change.
(The functional change is hard to separate cleanly
from the cleanup, so I just did them together.)
When inlining and unrolling an equality comparison
for a small struct or array, compare the elements like:

a[0] == b[0] && a[1] == b[1]

rather than

pa := &a
pb := &b
pa[0] == pb[0] && pa[1] == pb[1]

The result is the same, but taking the address
and working through the indirect
forces the backends to generate less efficient code.

Updates golang#15303


Sample code:

type T struct {
	a, b int8
}

func g(a T) bool {
	return a == T{1, 2}
}


SSA before:

"".g t=1 size=80 args=0x10 locals=0x8
	0x0000 00000 (badeq.go:7)	TEXT	"".g(SB), $8-16
	0x0000 00000 (badeq.go:7)	SUBQ	$8, SP
	0x0004 00004 (badeq.go:7)	FUNCDATA	$0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB)
	0x0004 00004 (badeq.go:7)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0004 00004 (badeq.go:8)	MOVBLZX	"".a+16(FP), AX
	0x0009 00009 (badeq.go:8)	MOVB	AL, "".autotmp_0+6(SP)
	0x000d 00013 (badeq.go:8)	MOVBLZX	"".a+17(FP), AX
	0x0012 00018 (badeq.go:8)	MOVB	AL, "".autotmp_0+7(SP)
	0x0016 00022 (badeq.go:8)	MOVB	$0, "".autotmp_1+4(SP)
	0x001b 00027 (badeq.go:8)	MOVB	$1, "".autotmp_1+4(SP)
	0x0020 00032 (badeq.go:8)	MOVB	$2, "".autotmp_1+5(SP)
	0x0025 00037 (badeq.go:8)	MOVBLZX	"".autotmp_0+6(SP), AX
	0x002a 00042 (badeq.go:8)	MOVBLZX	"".autotmp_1+4(SP), CX
	0x002f 00047 (badeq.go:8)	CMPB	AL, CL
	0x0031 00049 (badeq.go:8)	JNE	70
	0x0033 00051 (badeq.go:8)	MOVBLZX	"".autotmp_0+7(SP), AX
	0x0038 00056 (badeq.go:8)	CMPB	AL, $2
	0x003a 00058 (badeq.go:8)	SETEQ	AL
	0x003d 00061 (badeq.go:8)	MOVB	AL, "".~r1+24(FP)
	0x0041 00065 (badeq.go:8)	ADDQ	$8, SP
	0x0045 00069 (badeq.go:8)	RET
	0x0046 00070 (badeq.go:8)	MOVB	$0, AL
	0x0048 00072 (badeq.go:8)	JMP	61

SSA after:

"".g t=1 size=32 args=0x10 locals=0x0
	0x0000 00000 (badeq.go:7)	TEXT	"".g(SB), $0-16
	0x0000 00000 (badeq.go:7)	NOP
	0x0000 00000 (badeq.go:7)	NOP
	0x0000 00000 (badeq.go:7)	FUNCDATA	$0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB)
	0x0000 00000 (badeq.go:7)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (badeq.go:8)	MOVBLZX	"".a+8(FP), AX
	0x0005 00005 (badeq.go:8)	CMPB	AL, $1
	0x0007 00007 (badeq.go:8)	JNE	25
	0x0009 00009 (badeq.go:8)	MOVBLZX	"".a+9(FP), CX
	0x000e 00014 (badeq.go:8)	CMPB	CL, $2
	0x0011 00017 (badeq.go:8)	SETEQ	AL
	0x0014 00020 (badeq.go:8)	MOVB	AL, "".~r1+16(FP)
	0x0018 00024 (badeq.go:8)	RET
	0x0019 00025 (badeq.go:8)	MOVB	$0, AL
	0x001b 00027 (badeq.go:8)	JMP	20

Old backend before:

"".g t=1 size=96 args=0x10 locals=0x8
	0x0000 00000 (badeq.go:7)	TEXT	"".g(SB), $8-16
	0x0000 00000 (badeq.go:7)	SUBQ	$8, SP
	0x0004 00004 (badeq.go:7)	FUNCDATA	$0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB)
	0x0004 00004 (badeq.go:7)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0004 00004 (badeq.go:8)	MOVBQZX	"".a+16(FP), BX
	0x0009 00009 (badeq.go:8)	MOVB	BL, "".autotmp_0+6(SP)
	0x000d 00013 (badeq.go:8)	MOVBQZX	"".a+17(FP), BX
	0x0012 00018 (badeq.go:8)	MOVB	BL, "".autotmp_0+7(SP)
	0x0016 00022 (badeq.go:8)	MOVQ	$0, CX
	0x0018 00024 (badeq.go:8)	MOVB	CL, "".autotmp_1+4(SP)
	0x001c 00028 (badeq.go:8)	MOVB	CL, "".autotmp_1+5(SP)
	0x0020 00032 (badeq.go:8)	MOVB	$1, "".autotmp_1+4(SP)
	0x0025 00037 (badeq.go:8)	MOVB	$2, "".autotmp_1+5(SP)
	0x002a 00042 (badeq.go:8)	LEAQ	"".autotmp_0+6(SP), CX
	0x002f 00047 (badeq.go:8)	LEAQ	"".autotmp_1+4(SP), BX
	0x0034 00052 (badeq.go:8)	MOVQ	BX, AX
	0x0037 00055 (badeq.go:8)	NOP
	0x0037 00055 (badeq.go:8)	MOVBQZX	(CX), BX
	0x003a 00058 (badeq.go:8)	NOP
	0x003a 00058 (badeq.go:8)	MOVBQZX	(AX), BP
	0x003d 00061 (badeq.go:8)	CMPB	BL, BPB
	0x0040 00064 (badeq.go:8)	JNE	87
	0x0042 00066 (badeq.go:8)	NOP
	0x0042 00066 (badeq.go:8)	MOVBQZX	1(CX), BX
	0x0046 00070 (badeq.go:8)	NOP
	0x0046 00070 (badeq.go:8)	MOVBQZX	1(AX), BP
	0x004a 00074 (badeq.go:8)	CMPB	BL, BPB
	0x004d 00077 (badeq.go:8)	SETEQ	"".~r1+24(FP)
	0x0052 00082 (badeq.go:8)	ADDQ	$8, SP
	0x0056 00086 (badeq.go:8)	RET
	0x0057 00087 (badeq.go:8)	MOVB	$0, "".~r1+24(FP)
	0x005c 00092 (badeq.go:8)	JMP	82

Old backend after:

"".g t=1 size=64 args=0x10 locals=0x0
	0x0000 00000 (badeq.go:7)	TEXT	"".g(SB), $0-16
	0x0000 00000 (badeq.go:7)	NOP
	0x0000 00000 (badeq.go:7)	NOP
	0x0000 00000 (badeq.go:7)	FUNCDATA	$0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB)
	0x0000 00000 (badeq.go:7)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (badeq.go:8)	NOP
	0x0000 00000 (badeq.go:8)	MOVBQZX	"".a+8(FP), BX
	0x0005 00005 (badeq.go:8)	MOVQ	BX, BP
	0x0008 00008 (badeq.go:8)	MOVBQZX	"".a+9(FP), BX
	0x000d 00013 (badeq.go:8)	MOVQ	BX, DX
	0x0010 00016 (badeq.go:8)	NOP
	0x0010 00016 (badeq.go:8)	MOVQ	$0, BX
	0x0012 00018 (badeq.go:8)	NOP
	0x0012 00018 (badeq.go:8)	MOVQ	$1, CX
	0x0019 00025 (badeq.go:8)	MOVQ	$2, AX
	0x0020 00032 (badeq.go:8)	CMPB	BPB, CL
	0x0023 00035 (badeq.go:8)	JNE	45
	0x0025 00037 (badeq.go:8)	CMPB	DL, AL
	0x0027 00039 (badeq.go:8)	SETEQ	"".~r1+16(FP)
	0x002c 00044 (badeq.go:8)	RET
	0x002d 00045 (badeq.go:8)	MOVB	$0, "".~r1+16(FP)
	0x0032 00050 (badeq.go:8)	JMP	44


Change-Id: I120185d58012b7bbcdb1ec01225b5b08d0855d86
josharian added a commit to josharian/go that referenced this issue May 30, 2016
DO NOT REVIEW

[There is no clear way to improve things for all backends.
So this CL is going to wait until we are SSA by default
everywhere.]

This CL reworks walkcompare for clarity and concision.
It also makes one significant functional change.
(The functional change is hard to separate cleanly
from the cleanup, so I just did them together.)
When inlining and unrolling an equality comparison
for a small struct or array, compare the elements like:

a[0] == b[0] && a[1] == b[1]

rather than

pa := &a
pb := &b
pa[0] == pb[0] && pa[1] == pb[1]

The result is the same, but taking the address
and working through the indirect
forces the backends to generate less efficient code.

Updates golang#15303


Sample code:

type T struct {
	a, b int8
}

func g(a T) bool {
	return a == T{1, 2}
}


SSA before:

"".g t=1 size=80 args=0x10 locals=0x8
	0x0000 00000 (badeq.go:7)	TEXT	"".g(SB), $8-16
	0x0000 00000 (badeq.go:7)	SUBQ	$8, SP
	0x0004 00004 (badeq.go:7)	FUNCDATA	$0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB)
	0x0004 00004 (badeq.go:7)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0004 00004 (badeq.go:8)	MOVBLZX	"".a+16(FP), AX
	0x0009 00009 (badeq.go:8)	MOVB	AL, "".autotmp_0+6(SP)
	0x000d 00013 (badeq.go:8)	MOVBLZX	"".a+17(FP), AX
	0x0012 00018 (badeq.go:8)	MOVB	AL, "".autotmp_0+7(SP)
	0x0016 00022 (badeq.go:8)	MOVB	$0, "".autotmp_1+4(SP)
	0x001b 00027 (badeq.go:8)	MOVB	$1, "".autotmp_1+4(SP)
	0x0020 00032 (badeq.go:8)	MOVB	$2, "".autotmp_1+5(SP)
	0x0025 00037 (badeq.go:8)	MOVBLZX	"".autotmp_0+6(SP), AX
	0x002a 00042 (badeq.go:8)	MOVBLZX	"".autotmp_1+4(SP), CX
	0x002f 00047 (badeq.go:8)	CMPB	AL, CL
	0x0031 00049 (badeq.go:8)	JNE	70
	0x0033 00051 (badeq.go:8)	MOVBLZX	"".autotmp_0+7(SP), AX
	0x0038 00056 (badeq.go:8)	CMPB	AL, $2
	0x003a 00058 (badeq.go:8)	SETEQ	AL
	0x003d 00061 (badeq.go:8)	MOVB	AL, "".~r1+24(FP)
	0x0041 00065 (badeq.go:8)	ADDQ	$8, SP
	0x0045 00069 (badeq.go:8)	RET
	0x0046 00070 (badeq.go:8)	MOVB	$0, AL
	0x0048 00072 (badeq.go:8)	JMP	61

SSA after:

"".g t=1 size=32 args=0x10 locals=0x0
	0x0000 00000 (badeq.go:7)	TEXT	"".g(SB), $0-16
	0x0000 00000 (badeq.go:7)	NOP
	0x0000 00000 (badeq.go:7)	NOP
	0x0000 00000 (badeq.go:7)	FUNCDATA	$0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB)
	0x0000 00000 (badeq.go:7)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (badeq.go:8)	MOVBLZX	"".a+8(FP), AX
	0x0005 00005 (badeq.go:8)	CMPB	AL, $1
	0x0007 00007 (badeq.go:8)	JNE	25
	0x0009 00009 (badeq.go:8)	MOVBLZX	"".a+9(FP), CX
	0x000e 00014 (badeq.go:8)	CMPB	CL, $2
	0x0011 00017 (badeq.go:8)	SETEQ	AL
	0x0014 00020 (badeq.go:8)	MOVB	AL, "".~r1+16(FP)
	0x0018 00024 (badeq.go:8)	RET
	0x0019 00025 (badeq.go:8)	MOVB	$0, AL
	0x001b 00027 (badeq.go:8)	JMP	20

Old backend before:

"".g t=1 size=96 args=0x10 locals=0x8
	0x0000 00000 (badeq.go:7)	TEXT	"".g(SB), $8-16
	0x0000 00000 (badeq.go:7)	SUBQ	$8, SP
	0x0004 00004 (badeq.go:7)	FUNCDATA	$0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB)
	0x0004 00004 (badeq.go:7)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0004 00004 (badeq.go:8)	MOVBQZX	"".a+16(FP), BX
	0x0009 00009 (badeq.go:8)	MOVB	BL, "".autotmp_0+6(SP)
	0x000d 00013 (badeq.go:8)	MOVBQZX	"".a+17(FP), BX
	0x0012 00018 (badeq.go:8)	MOVB	BL, "".autotmp_0+7(SP)
	0x0016 00022 (badeq.go:8)	MOVQ	$0, CX
	0x0018 00024 (badeq.go:8)	MOVB	CL, "".autotmp_1+4(SP)
	0x001c 00028 (badeq.go:8)	MOVB	CL, "".autotmp_1+5(SP)
	0x0020 00032 (badeq.go:8)	MOVB	$1, "".autotmp_1+4(SP)
	0x0025 00037 (badeq.go:8)	MOVB	$2, "".autotmp_1+5(SP)
	0x002a 00042 (badeq.go:8)	LEAQ	"".autotmp_0+6(SP), CX
	0x002f 00047 (badeq.go:8)	LEAQ	"".autotmp_1+4(SP), BX
	0x0034 00052 (badeq.go:8)	MOVQ	BX, AX
	0x0037 00055 (badeq.go:8)	NOP
	0x0037 00055 (badeq.go:8)	MOVBQZX	(CX), BX
	0x003a 00058 (badeq.go:8)	NOP
	0x003a 00058 (badeq.go:8)	MOVBQZX	(AX), BP
	0x003d 00061 (badeq.go:8)	CMPB	BL, BPB
	0x0040 00064 (badeq.go:8)	JNE	87
	0x0042 00066 (badeq.go:8)	NOP
	0x0042 00066 (badeq.go:8)	MOVBQZX	1(CX), BX
	0x0046 00070 (badeq.go:8)	NOP
	0x0046 00070 (badeq.go:8)	MOVBQZX	1(AX), BP
	0x004a 00074 (badeq.go:8)	CMPB	BL, BPB
	0x004d 00077 (badeq.go:8)	SETEQ	"".~r1+24(FP)
	0x0052 00082 (badeq.go:8)	ADDQ	$8, SP
	0x0056 00086 (badeq.go:8)	RET
	0x0057 00087 (badeq.go:8)	MOVB	$0, "".~r1+24(FP)
	0x005c 00092 (badeq.go:8)	JMP	82

Old backend after:

"".g t=1 size=64 args=0x10 locals=0x0
	0x0000 00000 (badeq.go:7)	TEXT	"".g(SB), $0-16
	0x0000 00000 (badeq.go:7)	NOP
	0x0000 00000 (badeq.go:7)	NOP
	0x0000 00000 (badeq.go:7)	FUNCDATA	$0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB)
	0x0000 00000 (badeq.go:7)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (badeq.go:8)	NOP
	0x0000 00000 (badeq.go:8)	MOVBQZX	"".a+8(FP), BX
	0x0005 00005 (badeq.go:8)	MOVQ	BX, BP
	0x0008 00008 (badeq.go:8)	MOVBQZX	"".a+9(FP), BX
	0x000d 00013 (badeq.go:8)	MOVQ	BX, DX
	0x0010 00016 (badeq.go:8)	NOP
	0x0010 00016 (badeq.go:8)	MOVQ	$0, BX
	0x0012 00018 (badeq.go:8)	NOP
	0x0012 00018 (badeq.go:8)	MOVQ	$1, CX
	0x0019 00025 (badeq.go:8)	MOVQ	$2, AX
	0x0020 00032 (badeq.go:8)	CMPB	BPB, CL
	0x0023 00035 (badeq.go:8)	JNE	45
	0x0025 00037 (badeq.go:8)	CMPB	DL, AL
	0x0027 00039 (badeq.go:8)	SETEQ	"".~r1+16(FP)
	0x002c 00044 (badeq.go:8)	RET
	0x002d 00045 (badeq.go:8)	MOVB	$0, "".~r1+16(FP)
	0x0032 00050 (badeq.go:8)	JMP	44


Change-Id: I120185d58012b7bbcdb1ec01225b5b08d0855d86
josharian added a commit to josharian/go that referenced this issue Jun 17, 2016
DO NOT REVIEW

[There is no clear way to improve things for all backends.
So this CL is going to wait until we are SSA by default
everywhere.]

This CL reworks walkcompare for clarity and concision.
It also makes one significant functional change.
(The functional change is hard to separate cleanly
from the cleanup, so I just did them together.)
When inlining and unrolling an equality comparison
for a small struct or array, compare the elements like:

a[0] == b[0] && a[1] == b[1]

rather than

pa := &a
pb := &b
pa[0] == pb[0] && pa[1] == pb[1]

The result is the same, but taking the address
and working through the indirect
forces the backends to generate less efficient code.

Updates golang#15303


Sample code:

type T struct {
	a, b int8
}

func g(a T) bool {
	return a == T{1, 2}
}


SSA before:

"".g t=1 size=80 args=0x10 locals=0x8
	0x0000 00000 (badeq.go:7)	TEXT	"".g(SB), $8-16
	0x0000 00000 (badeq.go:7)	SUBQ	$8, SP
	0x0004 00004 (badeq.go:7)	FUNCDATA	$0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB)
	0x0004 00004 (badeq.go:7)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0004 00004 (badeq.go:8)	MOVBLZX	"".a+16(FP), AX
	0x0009 00009 (badeq.go:8)	MOVB	AL, "".autotmp_0+6(SP)
	0x000d 00013 (badeq.go:8)	MOVBLZX	"".a+17(FP), AX
	0x0012 00018 (badeq.go:8)	MOVB	AL, "".autotmp_0+7(SP)
	0x0016 00022 (badeq.go:8)	MOVB	$0, "".autotmp_1+4(SP)
	0x001b 00027 (badeq.go:8)	MOVB	$1, "".autotmp_1+4(SP)
	0x0020 00032 (badeq.go:8)	MOVB	$2, "".autotmp_1+5(SP)
	0x0025 00037 (badeq.go:8)	MOVBLZX	"".autotmp_0+6(SP), AX
	0x002a 00042 (badeq.go:8)	MOVBLZX	"".autotmp_1+4(SP), CX
	0x002f 00047 (badeq.go:8)	CMPB	AL, CL
	0x0031 00049 (badeq.go:8)	JNE	70
	0x0033 00051 (badeq.go:8)	MOVBLZX	"".autotmp_0+7(SP), AX
	0x0038 00056 (badeq.go:8)	CMPB	AL, $2
	0x003a 00058 (badeq.go:8)	SETEQ	AL
	0x003d 00061 (badeq.go:8)	MOVB	AL, "".~r1+24(FP)
	0x0041 00065 (badeq.go:8)	ADDQ	$8, SP
	0x0045 00069 (badeq.go:8)	RET
	0x0046 00070 (badeq.go:8)	MOVB	$0, AL
	0x0048 00072 (badeq.go:8)	JMP	61

SSA after:

"".g t=1 size=32 args=0x10 locals=0x0
	0x0000 00000 (badeq.go:7)	TEXT	"".g(SB), $0-16
	0x0000 00000 (badeq.go:7)	NOP
	0x0000 00000 (badeq.go:7)	NOP
	0x0000 00000 (badeq.go:7)	FUNCDATA	$0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB)
	0x0000 00000 (badeq.go:7)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (badeq.go:8)	MOVBLZX	"".a+8(FP), AX
	0x0005 00005 (badeq.go:8)	CMPB	AL, $1
	0x0007 00007 (badeq.go:8)	JNE	25
	0x0009 00009 (badeq.go:8)	MOVBLZX	"".a+9(FP), CX
	0x000e 00014 (badeq.go:8)	CMPB	CL, $2
	0x0011 00017 (badeq.go:8)	SETEQ	AL
	0x0014 00020 (badeq.go:8)	MOVB	AL, "".~r1+16(FP)
	0x0018 00024 (badeq.go:8)	RET
	0x0019 00025 (badeq.go:8)	MOVB	$0, AL
	0x001b 00027 (badeq.go:8)	JMP	20

Old backend before:

"".g t=1 size=96 args=0x10 locals=0x8
	0x0000 00000 (badeq.go:7)	TEXT	"".g(SB), $8-16
	0x0000 00000 (badeq.go:7)	SUBQ	$8, SP
	0x0004 00004 (badeq.go:7)	FUNCDATA	$0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB)
	0x0004 00004 (badeq.go:7)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0004 00004 (badeq.go:8)	MOVBQZX	"".a+16(FP), BX
	0x0009 00009 (badeq.go:8)	MOVB	BL, "".autotmp_0+6(SP)
	0x000d 00013 (badeq.go:8)	MOVBQZX	"".a+17(FP), BX
	0x0012 00018 (badeq.go:8)	MOVB	BL, "".autotmp_0+7(SP)
	0x0016 00022 (badeq.go:8)	MOVQ	$0, CX
	0x0018 00024 (badeq.go:8)	MOVB	CL, "".autotmp_1+4(SP)
	0x001c 00028 (badeq.go:8)	MOVB	CL, "".autotmp_1+5(SP)
	0x0020 00032 (badeq.go:8)	MOVB	$1, "".autotmp_1+4(SP)
	0x0025 00037 (badeq.go:8)	MOVB	$2, "".autotmp_1+5(SP)
	0x002a 00042 (badeq.go:8)	LEAQ	"".autotmp_0+6(SP), CX
	0x002f 00047 (badeq.go:8)	LEAQ	"".autotmp_1+4(SP), BX
	0x0034 00052 (badeq.go:8)	MOVQ	BX, AX
	0x0037 00055 (badeq.go:8)	NOP
	0x0037 00055 (badeq.go:8)	MOVBQZX	(CX), BX
	0x003a 00058 (badeq.go:8)	NOP
	0x003a 00058 (badeq.go:8)	MOVBQZX	(AX), BP
	0x003d 00061 (badeq.go:8)	CMPB	BL, BPB
	0x0040 00064 (badeq.go:8)	JNE	87
	0x0042 00066 (badeq.go:8)	NOP
	0x0042 00066 (badeq.go:8)	MOVBQZX	1(CX), BX
	0x0046 00070 (badeq.go:8)	NOP
	0x0046 00070 (badeq.go:8)	MOVBQZX	1(AX), BP
	0x004a 00074 (badeq.go:8)	CMPB	BL, BPB
	0x004d 00077 (badeq.go:8)	SETEQ	"".~r1+24(FP)
	0x0052 00082 (badeq.go:8)	ADDQ	$8, SP
	0x0056 00086 (badeq.go:8)	RET
	0x0057 00087 (badeq.go:8)	MOVB	$0, "".~r1+24(FP)
	0x005c 00092 (badeq.go:8)	JMP	82

Old backend after:

"".g t=1 size=64 args=0x10 locals=0x0
	0x0000 00000 (badeq.go:7)	TEXT	"".g(SB), $0-16
	0x0000 00000 (badeq.go:7)	NOP
	0x0000 00000 (badeq.go:7)	NOP
	0x0000 00000 (badeq.go:7)	FUNCDATA	$0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB)
	0x0000 00000 (badeq.go:7)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (badeq.go:8)	NOP
	0x0000 00000 (badeq.go:8)	MOVBQZX	"".a+8(FP), BX
	0x0005 00005 (badeq.go:8)	MOVQ	BX, BP
	0x0008 00008 (badeq.go:8)	MOVBQZX	"".a+9(FP), BX
	0x000d 00013 (badeq.go:8)	MOVQ	BX, DX
	0x0010 00016 (badeq.go:8)	NOP
	0x0010 00016 (badeq.go:8)	MOVQ	$0, BX
	0x0012 00018 (badeq.go:8)	NOP
	0x0012 00018 (badeq.go:8)	MOVQ	$1, CX
	0x0019 00025 (badeq.go:8)	MOVQ	$2, AX
	0x0020 00032 (badeq.go:8)	CMPB	BPB, CL
	0x0023 00035 (badeq.go:8)	JNE	45
	0x0025 00037 (badeq.go:8)	CMPB	DL, AL
	0x0027 00039 (badeq.go:8)	SETEQ	"".~r1+16(FP)
	0x002c 00044 (badeq.go:8)	RET
	0x002d 00045 (badeq.go:8)	MOVB	$0, "".~r1+16(FP)
	0x0032 00050 (badeq.go:8)	JMP	44


Change-Id: I120185d58012b7bbcdb1ec01225b5b08d0855d86
gopherbot pushed a commit that referenced this issue Aug 25, 2016
This CL reworks walkcompare for clarity and concision.
It also makes one significant functional change.
(The functional change is hard to separate cleanly
from the cleanup, so I just did them together.)
When inlining and unrolling an equality comparison
for a small struct or array, compare the elements like:

a[0] == b[0] && a[1] == b[1]

rather than

pa := &a
pb := &b
pa[0] == pb[0] && pa[1] == pb[1]

The result is the same, but taking the address
and working through the indirect
forces the backends to generate less efficient code.

This is only an improvement with the SSA backend.
However, every port but s390x now has a working
SSA backend, and switching to the SSA backend
by default everywhere is a priority for Go 1.8.
It thus seems reasonable to start to prioritize
SSA performance over the old backend.

Updates #15303


Sample code:

type T struct {
	a, b int8
}

func g(a T) bool {
	return a == T{1, 2}
}


SSA before:

"".g t=1 size=80 args=0x10 locals=0x8
	0x0000 00000 (badeq.go:7)	TEXT	"".g(SB), $8-16
	0x0000 00000 (badeq.go:7)	SUBQ	$8, SP
	0x0004 00004 (badeq.go:7)	FUNCDATA	$0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB)
	0x0004 00004 (badeq.go:7)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0004 00004 (badeq.go:8)	MOVBLZX	"".a+16(FP), AX
	0x0009 00009 (badeq.go:8)	MOVB	AL, "".autotmp_0+6(SP)
	0x000d 00013 (badeq.go:8)	MOVBLZX	"".a+17(FP), AX
	0x0012 00018 (badeq.go:8)	MOVB	AL, "".autotmp_0+7(SP)
	0x0016 00022 (badeq.go:8)	MOVB	$0, "".autotmp_1+4(SP)
	0x001b 00027 (badeq.go:8)	MOVB	$1, "".autotmp_1+4(SP)
	0x0020 00032 (badeq.go:8)	MOVB	$2, "".autotmp_1+5(SP)
	0x0025 00037 (badeq.go:8)	MOVBLZX	"".autotmp_0+6(SP), AX
	0x002a 00042 (badeq.go:8)	MOVBLZX	"".autotmp_1+4(SP), CX
	0x002f 00047 (badeq.go:8)	CMPB	AL, CL
	0x0031 00049 (badeq.go:8)	JNE	70
	0x0033 00051 (badeq.go:8)	MOVBLZX	"".autotmp_0+7(SP), AX
	0x0038 00056 (badeq.go:8)	CMPB	AL, $2
	0x003a 00058 (badeq.go:8)	SETEQ	AL
	0x003d 00061 (badeq.go:8)	MOVB	AL, "".~r1+24(FP)
	0x0041 00065 (badeq.go:8)	ADDQ	$8, SP
	0x0045 00069 (badeq.go:8)	RET
	0x0046 00070 (badeq.go:8)	MOVB	$0, AL
	0x0048 00072 (badeq.go:8)	JMP	61

SSA after:

"".g t=1 size=32 args=0x10 locals=0x0
	0x0000 00000 (badeq.go:7)	TEXT	"".g(SB), $0-16
	0x0000 00000 (badeq.go:7)	NOP
	0x0000 00000 (badeq.go:7)	NOP
	0x0000 00000 (badeq.go:7)	FUNCDATA	$0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB)
	0x0000 00000 (badeq.go:7)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (badeq.go:8)	MOVBLZX	"".a+8(FP), AX
	0x0005 00005 (badeq.go:8)	CMPB	AL, $1
	0x0007 00007 (badeq.go:8)	JNE	25
	0x0009 00009 (badeq.go:8)	MOVBLZX	"".a+9(FP), CX
	0x000e 00014 (badeq.go:8)	CMPB	CL, $2
	0x0011 00017 (badeq.go:8)	SETEQ	AL
	0x0014 00020 (badeq.go:8)	MOVB	AL, "".~r1+16(FP)
	0x0018 00024 (badeq.go:8)	RET
	0x0019 00025 (badeq.go:8)	MOVB	$0, AL
	0x001b 00027 (badeq.go:8)	JMP	20


Change-Id: I120185d58012b7bbcdb1ec01225b5b08d0855d86
Reviewed-on: https://go-review.googlesource.com/22277
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
@randall77
Copy link
Contributor

This seems fixed. If there are any outstanding remnants (unnecessary zeroing?) you can open a different issue.

@golang golang locked and limited conversation to collaborators Aug 31, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants