-
Notifications
You must be signed in to change notification settings - Fork 18k
cmd/compile: silly amounts of code generated for simple struct equality tests #15303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
CL 22277 mostly addresses this, but it doesn't explain or fix the unnecessary zeroing introduced during the decompose phase, which I suspect could still happen via another code path. For example, it also used to happen when appending arrays/structs prior to CL 22248. Maybe @brtzsnr would be interested in looking into that? |
CL https://golang.org/cl/22277 mentions this issue. |
CL 22277 is on hold until SSA is the default everywhere. In the meantime, hopefully @brtzsnr will succeed in making SSA do a better job with what we've got. |
Not sure I can do anything simple here. I looked into a == T{1, 2} example and the tradeoff seems to be between 1 cmp + 1 jmp + 1/2 cmp and 2 cmp + 1 and8. If the cmp (ints) is cheap we can do the latter. Otherwise (for strings) the former is faster. Maybe one improvement is to order the comparisons of fields by cost. First ints, then string/slice length, then small arrays, etc. |
Thanks for checking into it, @brtzsnr, much appreciated. |
DO NOT REVIEW [There is no clear way to improve things for all backends. So this CL is going to wait until we are SSA by default everywhere.] This CL reworks walkcompare for clarity and concision. It also makes one significant functional change. (The functional change is hard to separate cleanly from the cleanup, so I just did them together.) When inlining and unrolling an equality comparison for a small struct or array, compare the elements like: a[0] == b[0] && a[1] == b[1] rather than pa := &a pb := &b pa[0] == pb[0] && pa[1] == pb[1] The result is the same, but taking the address and working through the indirect forces the backends to generate less efficient code. Updates golang#15303 Sample code: type T struct { a, b int8 } func g(a T) bool { return a == T{1, 2} } SSA before: "".g t=1 size=80 args=0x10 locals=0x8 0x0000 00000 (badeq.go:7) TEXT "".g(SB), $8-16 0x0000 00000 (badeq.go:7) SUBQ $8, SP 0x0004 00004 (badeq.go:7) FUNCDATA $0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB) 0x0004 00004 (badeq.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0004 00004 (badeq.go:8) MOVBLZX "".a+16(FP), AX 0x0009 00009 (badeq.go:8) MOVB AL, "".autotmp_0+6(SP) 0x000d 00013 (badeq.go:8) MOVBLZX "".a+17(FP), AX 0x0012 00018 (badeq.go:8) MOVB AL, "".autotmp_0+7(SP) 0x0016 00022 (badeq.go:8) MOVB $0, "".autotmp_1+4(SP) 0x001b 00027 (badeq.go:8) MOVB $1, "".autotmp_1+4(SP) 0x0020 00032 (badeq.go:8) MOVB $2, "".autotmp_1+5(SP) 0x0025 00037 (badeq.go:8) MOVBLZX "".autotmp_0+6(SP), AX 0x002a 00042 (badeq.go:8) MOVBLZX "".autotmp_1+4(SP), CX 0x002f 00047 (badeq.go:8) CMPB AL, CL 0x0031 00049 (badeq.go:8) JNE 70 0x0033 00051 (badeq.go:8) MOVBLZX "".autotmp_0+7(SP), AX 0x0038 00056 (badeq.go:8) CMPB AL, $2 0x003a 00058 (badeq.go:8) SETEQ AL 0x003d 00061 (badeq.go:8) MOVB AL, "".~r1+24(FP) 0x0041 00065 (badeq.go:8) ADDQ $8, SP 0x0045 00069 (badeq.go:8) RET 0x0046 00070 (badeq.go:8) MOVB $0, AL 0x0048 00072 (badeq.go:8) JMP 61 SSA after: "".g t=1 size=32 args=0x10 locals=0x0 0x0000 00000 (badeq.go:7) TEXT "".g(SB), $0-16 0x0000 00000 (badeq.go:7) NOP 0x0000 00000 (badeq.go:7) NOP 0x0000 00000 (badeq.go:7) FUNCDATA $0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB) 0x0000 00000 (badeq.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (badeq.go:8) MOVBLZX "".a+8(FP), AX 0x0005 00005 (badeq.go:8) CMPB AL, $1 0x0007 00007 (badeq.go:8) JNE 25 0x0009 00009 (badeq.go:8) MOVBLZX "".a+9(FP), CX 0x000e 00014 (badeq.go:8) CMPB CL, $2 0x0011 00017 (badeq.go:8) SETEQ AL 0x0014 00020 (badeq.go:8) MOVB AL, "".~r1+16(FP) 0x0018 00024 (badeq.go:8) RET 0x0019 00025 (badeq.go:8) MOVB $0, AL 0x001b 00027 (badeq.go:8) JMP 20 Old backend before: "".g t=1 size=96 args=0x10 locals=0x8 0x0000 00000 (badeq.go:7) TEXT "".g(SB), $8-16 0x0000 00000 (badeq.go:7) SUBQ $8, SP 0x0004 00004 (badeq.go:7) FUNCDATA $0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB) 0x0004 00004 (badeq.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0004 00004 (badeq.go:8) MOVBQZX "".a+16(FP), BX 0x0009 00009 (badeq.go:8) MOVB BL, "".autotmp_0+6(SP) 0x000d 00013 (badeq.go:8) MOVBQZX "".a+17(FP), BX 0x0012 00018 (badeq.go:8) MOVB BL, "".autotmp_0+7(SP) 0x0016 00022 (badeq.go:8) MOVQ $0, CX 0x0018 00024 (badeq.go:8) MOVB CL, "".autotmp_1+4(SP) 0x001c 00028 (badeq.go:8) MOVB CL, "".autotmp_1+5(SP) 0x0020 00032 (badeq.go:8) MOVB $1, "".autotmp_1+4(SP) 0x0025 00037 (badeq.go:8) MOVB $2, "".autotmp_1+5(SP) 0x002a 00042 (badeq.go:8) LEAQ "".autotmp_0+6(SP), CX 0x002f 00047 (badeq.go:8) LEAQ "".autotmp_1+4(SP), BX 0x0034 00052 (badeq.go:8) MOVQ BX, AX 0x0037 00055 (badeq.go:8) NOP 0x0037 00055 (badeq.go:8) MOVBQZX (CX), BX 0x003a 00058 (badeq.go:8) NOP 0x003a 00058 (badeq.go:8) MOVBQZX (AX), BP 0x003d 00061 (badeq.go:8) CMPB BL, BPB 0x0040 00064 (badeq.go:8) JNE 87 0x0042 00066 (badeq.go:8) NOP 0x0042 00066 (badeq.go:8) MOVBQZX 1(CX), BX 0x0046 00070 (badeq.go:8) NOP 0x0046 00070 (badeq.go:8) MOVBQZX 1(AX), BP 0x004a 00074 (badeq.go:8) CMPB BL, BPB 0x004d 00077 (badeq.go:8) SETEQ "".~r1+24(FP) 0x0052 00082 (badeq.go:8) ADDQ $8, SP 0x0056 00086 (badeq.go:8) RET 0x0057 00087 (badeq.go:8) MOVB $0, "".~r1+24(FP) 0x005c 00092 (badeq.go:8) JMP 82 Old backend after: "".g t=1 size=64 args=0x10 locals=0x0 0x0000 00000 (badeq.go:7) TEXT "".g(SB), $0-16 0x0000 00000 (badeq.go:7) NOP 0x0000 00000 (badeq.go:7) NOP 0x0000 00000 (badeq.go:7) FUNCDATA $0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB) 0x0000 00000 (badeq.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (badeq.go:8) NOP 0x0000 00000 (badeq.go:8) MOVBQZX "".a+8(FP), BX 0x0005 00005 (badeq.go:8) MOVQ BX, BP 0x0008 00008 (badeq.go:8) MOVBQZX "".a+9(FP), BX 0x000d 00013 (badeq.go:8) MOVQ BX, DX 0x0010 00016 (badeq.go:8) NOP 0x0010 00016 (badeq.go:8) MOVQ $0, BX 0x0012 00018 (badeq.go:8) NOP 0x0012 00018 (badeq.go:8) MOVQ $1, CX 0x0019 00025 (badeq.go:8) MOVQ $2, AX 0x0020 00032 (badeq.go:8) CMPB BPB, CL 0x0023 00035 (badeq.go:8) JNE 45 0x0025 00037 (badeq.go:8) CMPB DL, AL 0x0027 00039 (badeq.go:8) SETEQ "".~r1+16(FP) 0x002c 00044 (badeq.go:8) RET 0x002d 00045 (badeq.go:8) MOVB $0, "".~r1+16(FP) 0x0032 00050 (badeq.go:8) JMP 44 Change-Id: I120185d58012b7bbcdb1ec01225b5b08d0855d86
DO NOT REVIEW [There is no clear way to improve things for all backends. So this CL is going to wait until we are SSA by default everywhere.] This CL reworks walkcompare for clarity and concision. It also makes one significant functional change. (The functional change is hard to separate cleanly from the cleanup, so I just did them together.) When inlining and unrolling an equality comparison for a small struct or array, compare the elements like: a[0] == b[0] && a[1] == b[1] rather than pa := &a pb := &b pa[0] == pb[0] && pa[1] == pb[1] The result is the same, but taking the address and working through the indirect forces the backends to generate less efficient code. Updates golang#15303 Sample code: type T struct { a, b int8 } func g(a T) bool { return a == T{1, 2} } SSA before: "".g t=1 size=80 args=0x10 locals=0x8 0x0000 00000 (badeq.go:7) TEXT "".g(SB), $8-16 0x0000 00000 (badeq.go:7) SUBQ $8, SP 0x0004 00004 (badeq.go:7) FUNCDATA $0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB) 0x0004 00004 (badeq.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0004 00004 (badeq.go:8) MOVBLZX "".a+16(FP), AX 0x0009 00009 (badeq.go:8) MOVB AL, "".autotmp_0+6(SP) 0x000d 00013 (badeq.go:8) MOVBLZX "".a+17(FP), AX 0x0012 00018 (badeq.go:8) MOVB AL, "".autotmp_0+7(SP) 0x0016 00022 (badeq.go:8) MOVB $0, "".autotmp_1+4(SP) 0x001b 00027 (badeq.go:8) MOVB $1, "".autotmp_1+4(SP) 0x0020 00032 (badeq.go:8) MOVB $2, "".autotmp_1+5(SP) 0x0025 00037 (badeq.go:8) MOVBLZX "".autotmp_0+6(SP), AX 0x002a 00042 (badeq.go:8) MOVBLZX "".autotmp_1+4(SP), CX 0x002f 00047 (badeq.go:8) CMPB AL, CL 0x0031 00049 (badeq.go:8) JNE 70 0x0033 00051 (badeq.go:8) MOVBLZX "".autotmp_0+7(SP), AX 0x0038 00056 (badeq.go:8) CMPB AL, $2 0x003a 00058 (badeq.go:8) SETEQ AL 0x003d 00061 (badeq.go:8) MOVB AL, "".~r1+24(FP) 0x0041 00065 (badeq.go:8) ADDQ $8, SP 0x0045 00069 (badeq.go:8) RET 0x0046 00070 (badeq.go:8) MOVB $0, AL 0x0048 00072 (badeq.go:8) JMP 61 SSA after: "".g t=1 size=32 args=0x10 locals=0x0 0x0000 00000 (badeq.go:7) TEXT "".g(SB), $0-16 0x0000 00000 (badeq.go:7) NOP 0x0000 00000 (badeq.go:7) NOP 0x0000 00000 (badeq.go:7) FUNCDATA $0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB) 0x0000 00000 (badeq.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (badeq.go:8) MOVBLZX "".a+8(FP), AX 0x0005 00005 (badeq.go:8) CMPB AL, $1 0x0007 00007 (badeq.go:8) JNE 25 0x0009 00009 (badeq.go:8) MOVBLZX "".a+9(FP), CX 0x000e 00014 (badeq.go:8) CMPB CL, $2 0x0011 00017 (badeq.go:8) SETEQ AL 0x0014 00020 (badeq.go:8) MOVB AL, "".~r1+16(FP) 0x0018 00024 (badeq.go:8) RET 0x0019 00025 (badeq.go:8) MOVB $0, AL 0x001b 00027 (badeq.go:8) JMP 20 Old backend before: "".g t=1 size=96 args=0x10 locals=0x8 0x0000 00000 (badeq.go:7) TEXT "".g(SB), $8-16 0x0000 00000 (badeq.go:7) SUBQ $8, SP 0x0004 00004 (badeq.go:7) FUNCDATA $0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB) 0x0004 00004 (badeq.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0004 00004 (badeq.go:8) MOVBQZX "".a+16(FP), BX 0x0009 00009 (badeq.go:8) MOVB BL, "".autotmp_0+6(SP) 0x000d 00013 (badeq.go:8) MOVBQZX "".a+17(FP), BX 0x0012 00018 (badeq.go:8) MOVB BL, "".autotmp_0+7(SP) 0x0016 00022 (badeq.go:8) MOVQ $0, CX 0x0018 00024 (badeq.go:8) MOVB CL, "".autotmp_1+4(SP) 0x001c 00028 (badeq.go:8) MOVB CL, "".autotmp_1+5(SP) 0x0020 00032 (badeq.go:8) MOVB $1, "".autotmp_1+4(SP) 0x0025 00037 (badeq.go:8) MOVB $2, "".autotmp_1+5(SP) 0x002a 00042 (badeq.go:8) LEAQ "".autotmp_0+6(SP), CX 0x002f 00047 (badeq.go:8) LEAQ "".autotmp_1+4(SP), BX 0x0034 00052 (badeq.go:8) MOVQ BX, AX 0x0037 00055 (badeq.go:8) NOP 0x0037 00055 (badeq.go:8) MOVBQZX (CX), BX 0x003a 00058 (badeq.go:8) NOP 0x003a 00058 (badeq.go:8) MOVBQZX (AX), BP 0x003d 00061 (badeq.go:8) CMPB BL, BPB 0x0040 00064 (badeq.go:8) JNE 87 0x0042 00066 (badeq.go:8) NOP 0x0042 00066 (badeq.go:8) MOVBQZX 1(CX), BX 0x0046 00070 (badeq.go:8) NOP 0x0046 00070 (badeq.go:8) MOVBQZX 1(AX), BP 0x004a 00074 (badeq.go:8) CMPB BL, BPB 0x004d 00077 (badeq.go:8) SETEQ "".~r1+24(FP) 0x0052 00082 (badeq.go:8) ADDQ $8, SP 0x0056 00086 (badeq.go:8) RET 0x0057 00087 (badeq.go:8) MOVB $0, "".~r1+24(FP) 0x005c 00092 (badeq.go:8) JMP 82 Old backend after: "".g t=1 size=64 args=0x10 locals=0x0 0x0000 00000 (badeq.go:7) TEXT "".g(SB), $0-16 0x0000 00000 (badeq.go:7) NOP 0x0000 00000 (badeq.go:7) NOP 0x0000 00000 (badeq.go:7) FUNCDATA $0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB) 0x0000 00000 (badeq.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (badeq.go:8) NOP 0x0000 00000 (badeq.go:8) MOVBQZX "".a+8(FP), BX 0x0005 00005 (badeq.go:8) MOVQ BX, BP 0x0008 00008 (badeq.go:8) MOVBQZX "".a+9(FP), BX 0x000d 00013 (badeq.go:8) MOVQ BX, DX 0x0010 00016 (badeq.go:8) NOP 0x0010 00016 (badeq.go:8) MOVQ $0, BX 0x0012 00018 (badeq.go:8) NOP 0x0012 00018 (badeq.go:8) MOVQ $1, CX 0x0019 00025 (badeq.go:8) MOVQ $2, AX 0x0020 00032 (badeq.go:8) CMPB BPB, CL 0x0023 00035 (badeq.go:8) JNE 45 0x0025 00037 (badeq.go:8) CMPB DL, AL 0x0027 00039 (badeq.go:8) SETEQ "".~r1+16(FP) 0x002c 00044 (badeq.go:8) RET 0x002d 00045 (badeq.go:8) MOVB $0, "".~r1+16(FP) 0x0032 00050 (badeq.go:8) JMP 44 Change-Id: I120185d58012b7bbcdb1ec01225b5b08d0855d86
DO NOT REVIEW [There is no clear way to improve things for all backends. So this CL is going to wait until we are SSA by default everywhere.] This CL reworks walkcompare for clarity and concision. It also makes one significant functional change. (The functional change is hard to separate cleanly from the cleanup, so I just did them together.) When inlining and unrolling an equality comparison for a small struct or array, compare the elements like: a[0] == b[0] && a[1] == b[1] rather than pa := &a pb := &b pa[0] == pb[0] && pa[1] == pb[1] The result is the same, but taking the address and working through the indirect forces the backends to generate less efficient code. Updates golang#15303 Sample code: type T struct { a, b int8 } func g(a T) bool { return a == T{1, 2} } SSA before: "".g t=1 size=80 args=0x10 locals=0x8 0x0000 00000 (badeq.go:7) TEXT "".g(SB), $8-16 0x0000 00000 (badeq.go:7) SUBQ $8, SP 0x0004 00004 (badeq.go:7) FUNCDATA $0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB) 0x0004 00004 (badeq.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0004 00004 (badeq.go:8) MOVBLZX "".a+16(FP), AX 0x0009 00009 (badeq.go:8) MOVB AL, "".autotmp_0+6(SP) 0x000d 00013 (badeq.go:8) MOVBLZX "".a+17(FP), AX 0x0012 00018 (badeq.go:8) MOVB AL, "".autotmp_0+7(SP) 0x0016 00022 (badeq.go:8) MOVB $0, "".autotmp_1+4(SP) 0x001b 00027 (badeq.go:8) MOVB $1, "".autotmp_1+4(SP) 0x0020 00032 (badeq.go:8) MOVB $2, "".autotmp_1+5(SP) 0x0025 00037 (badeq.go:8) MOVBLZX "".autotmp_0+6(SP), AX 0x002a 00042 (badeq.go:8) MOVBLZX "".autotmp_1+4(SP), CX 0x002f 00047 (badeq.go:8) CMPB AL, CL 0x0031 00049 (badeq.go:8) JNE 70 0x0033 00051 (badeq.go:8) MOVBLZX "".autotmp_0+7(SP), AX 0x0038 00056 (badeq.go:8) CMPB AL, $2 0x003a 00058 (badeq.go:8) SETEQ AL 0x003d 00061 (badeq.go:8) MOVB AL, "".~r1+24(FP) 0x0041 00065 (badeq.go:8) ADDQ $8, SP 0x0045 00069 (badeq.go:8) RET 0x0046 00070 (badeq.go:8) MOVB $0, AL 0x0048 00072 (badeq.go:8) JMP 61 SSA after: "".g t=1 size=32 args=0x10 locals=0x0 0x0000 00000 (badeq.go:7) TEXT "".g(SB), $0-16 0x0000 00000 (badeq.go:7) NOP 0x0000 00000 (badeq.go:7) NOP 0x0000 00000 (badeq.go:7) FUNCDATA $0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB) 0x0000 00000 (badeq.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (badeq.go:8) MOVBLZX "".a+8(FP), AX 0x0005 00005 (badeq.go:8) CMPB AL, $1 0x0007 00007 (badeq.go:8) JNE 25 0x0009 00009 (badeq.go:8) MOVBLZX "".a+9(FP), CX 0x000e 00014 (badeq.go:8) CMPB CL, $2 0x0011 00017 (badeq.go:8) SETEQ AL 0x0014 00020 (badeq.go:8) MOVB AL, "".~r1+16(FP) 0x0018 00024 (badeq.go:8) RET 0x0019 00025 (badeq.go:8) MOVB $0, AL 0x001b 00027 (badeq.go:8) JMP 20 Old backend before: "".g t=1 size=96 args=0x10 locals=0x8 0x0000 00000 (badeq.go:7) TEXT "".g(SB), $8-16 0x0000 00000 (badeq.go:7) SUBQ $8, SP 0x0004 00004 (badeq.go:7) FUNCDATA $0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB) 0x0004 00004 (badeq.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0004 00004 (badeq.go:8) MOVBQZX "".a+16(FP), BX 0x0009 00009 (badeq.go:8) MOVB BL, "".autotmp_0+6(SP) 0x000d 00013 (badeq.go:8) MOVBQZX "".a+17(FP), BX 0x0012 00018 (badeq.go:8) MOVB BL, "".autotmp_0+7(SP) 0x0016 00022 (badeq.go:8) MOVQ $0, CX 0x0018 00024 (badeq.go:8) MOVB CL, "".autotmp_1+4(SP) 0x001c 00028 (badeq.go:8) MOVB CL, "".autotmp_1+5(SP) 0x0020 00032 (badeq.go:8) MOVB $1, "".autotmp_1+4(SP) 0x0025 00037 (badeq.go:8) MOVB $2, "".autotmp_1+5(SP) 0x002a 00042 (badeq.go:8) LEAQ "".autotmp_0+6(SP), CX 0x002f 00047 (badeq.go:8) LEAQ "".autotmp_1+4(SP), BX 0x0034 00052 (badeq.go:8) MOVQ BX, AX 0x0037 00055 (badeq.go:8) NOP 0x0037 00055 (badeq.go:8) MOVBQZX (CX), BX 0x003a 00058 (badeq.go:8) NOP 0x003a 00058 (badeq.go:8) MOVBQZX (AX), BP 0x003d 00061 (badeq.go:8) CMPB BL, BPB 0x0040 00064 (badeq.go:8) JNE 87 0x0042 00066 (badeq.go:8) NOP 0x0042 00066 (badeq.go:8) MOVBQZX 1(CX), BX 0x0046 00070 (badeq.go:8) NOP 0x0046 00070 (badeq.go:8) MOVBQZX 1(AX), BP 0x004a 00074 (badeq.go:8) CMPB BL, BPB 0x004d 00077 (badeq.go:8) SETEQ "".~r1+24(FP) 0x0052 00082 (badeq.go:8) ADDQ $8, SP 0x0056 00086 (badeq.go:8) RET 0x0057 00087 (badeq.go:8) MOVB $0, "".~r1+24(FP) 0x005c 00092 (badeq.go:8) JMP 82 Old backend after: "".g t=1 size=64 args=0x10 locals=0x0 0x0000 00000 (badeq.go:7) TEXT "".g(SB), $0-16 0x0000 00000 (badeq.go:7) NOP 0x0000 00000 (badeq.go:7) NOP 0x0000 00000 (badeq.go:7) FUNCDATA $0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB) 0x0000 00000 (badeq.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (badeq.go:8) NOP 0x0000 00000 (badeq.go:8) MOVBQZX "".a+8(FP), BX 0x0005 00005 (badeq.go:8) MOVQ BX, BP 0x0008 00008 (badeq.go:8) MOVBQZX "".a+9(FP), BX 0x000d 00013 (badeq.go:8) MOVQ BX, DX 0x0010 00016 (badeq.go:8) NOP 0x0010 00016 (badeq.go:8) MOVQ $0, BX 0x0012 00018 (badeq.go:8) NOP 0x0012 00018 (badeq.go:8) MOVQ $1, CX 0x0019 00025 (badeq.go:8) MOVQ $2, AX 0x0020 00032 (badeq.go:8) CMPB BPB, CL 0x0023 00035 (badeq.go:8) JNE 45 0x0025 00037 (badeq.go:8) CMPB DL, AL 0x0027 00039 (badeq.go:8) SETEQ "".~r1+16(FP) 0x002c 00044 (badeq.go:8) RET 0x002d 00045 (badeq.go:8) MOVB $0, "".~r1+16(FP) 0x0032 00050 (badeq.go:8) JMP 44 Change-Id: I120185d58012b7bbcdb1ec01225b5b08d0855d86
DO NOT REVIEW [There is no clear way to improve things for all backends. So this CL is going to wait until we are SSA by default everywhere.] This CL reworks walkcompare for clarity and concision. It also makes one significant functional change. (The functional change is hard to separate cleanly from the cleanup, so I just did them together.) When inlining and unrolling an equality comparison for a small struct or array, compare the elements like: a[0] == b[0] && a[1] == b[1] rather than pa := &a pb := &b pa[0] == pb[0] && pa[1] == pb[1] The result is the same, but taking the address and working through the indirect forces the backends to generate less efficient code. Updates golang#15303 Sample code: type T struct { a, b int8 } func g(a T) bool { return a == T{1, 2} } SSA before: "".g t=1 size=80 args=0x10 locals=0x8 0x0000 00000 (badeq.go:7) TEXT "".g(SB), $8-16 0x0000 00000 (badeq.go:7) SUBQ $8, SP 0x0004 00004 (badeq.go:7) FUNCDATA $0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB) 0x0004 00004 (badeq.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0004 00004 (badeq.go:8) MOVBLZX "".a+16(FP), AX 0x0009 00009 (badeq.go:8) MOVB AL, "".autotmp_0+6(SP) 0x000d 00013 (badeq.go:8) MOVBLZX "".a+17(FP), AX 0x0012 00018 (badeq.go:8) MOVB AL, "".autotmp_0+7(SP) 0x0016 00022 (badeq.go:8) MOVB $0, "".autotmp_1+4(SP) 0x001b 00027 (badeq.go:8) MOVB $1, "".autotmp_1+4(SP) 0x0020 00032 (badeq.go:8) MOVB $2, "".autotmp_1+5(SP) 0x0025 00037 (badeq.go:8) MOVBLZX "".autotmp_0+6(SP), AX 0x002a 00042 (badeq.go:8) MOVBLZX "".autotmp_1+4(SP), CX 0x002f 00047 (badeq.go:8) CMPB AL, CL 0x0031 00049 (badeq.go:8) JNE 70 0x0033 00051 (badeq.go:8) MOVBLZX "".autotmp_0+7(SP), AX 0x0038 00056 (badeq.go:8) CMPB AL, $2 0x003a 00058 (badeq.go:8) SETEQ AL 0x003d 00061 (badeq.go:8) MOVB AL, "".~r1+24(FP) 0x0041 00065 (badeq.go:8) ADDQ $8, SP 0x0045 00069 (badeq.go:8) RET 0x0046 00070 (badeq.go:8) MOVB $0, AL 0x0048 00072 (badeq.go:8) JMP 61 SSA after: "".g t=1 size=32 args=0x10 locals=0x0 0x0000 00000 (badeq.go:7) TEXT "".g(SB), $0-16 0x0000 00000 (badeq.go:7) NOP 0x0000 00000 (badeq.go:7) NOP 0x0000 00000 (badeq.go:7) FUNCDATA $0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB) 0x0000 00000 (badeq.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (badeq.go:8) MOVBLZX "".a+8(FP), AX 0x0005 00005 (badeq.go:8) CMPB AL, $1 0x0007 00007 (badeq.go:8) JNE 25 0x0009 00009 (badeq.go:8) MOVBLZX "".a+9(FP), CX 0x000e 00014 (badeq.go:8) CMPB CL, $2 0x0011 00017 (badeq.go:8) SETEQ AL 0x0014 00020 (badeq.go:8) MOVB AL, "".~r1+16(FP) 0x0018 00024 (badeq.go:8) RET 0x0019 00025 (badeq.go:8) MOVB $0, AL 0x001b 00027 (badeq.go:8) JMP 20 Old backend before: "".g t=1 size=96 args=0x10 locals=0x8 0x0000 00000 (badeq.go:7) TEXT "".g(SB), $8-16 0x0000 00000 (badeq.go:7) SUBQ $8, SP 0x0004 00004 (badeq.go:7) FUNCDATA $0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB) 0x0004 00004 (badeq.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0004 00004 (badeq.go:8) MOVBQZX "".a+16(FP), BX 0x0009 00009 (badeq.go:8) MOVB BL, "".autotmp_0+6(SP) 0x000d 00013 (badeq.go:8) MOVBQZX "".a+17(FP), BX 0x0012 00018 (badeq.go:8) MOVB BL, "".autotmp_0+7(SP) 0x0016 00022 (badeq.go:8) MOVQ $0, CX 0x0018 00024 (badeq.go:8) MOVB CL, "".autotmp_1+4(SP) 0x001c 00028 (badeq.go:8) MOVB CL, "".autotmp_1+5(SP) 0x0020 00032 (badeq.go:8) MOVB $1, "".autotmp_1+4(SP) 0x0025 00037 (badeq.go:8) MOVB $2, "".autotmp_1+5(SP) 0x002a 00042 (badeq.go:8) LEAQ "".autotmp_0+6(SP), CX 0x002f 00047 (badeq.go:8) LEAQ "".autotmp_1+4(SP), BX 0x0034 00052 (badeq.go:8) MOVQ BX, AX 0x0037 00055 (badeq.go:8) NOP 0x0037 00055 (badeq.go:8) MOVBQZX (CX), BX 0x003a 00058 (badeq.go:8) NOP 0x003a 00058 (badeq.go:8) MOVBQZX (AX), BP 0x003d 00061 (badeq.go:8) CMPB BL, BPB 0x0040 00064 (badeq.go:8) JNE 87 0x0042 00066 (badeq.go:8) NOP 0x0042 00066 (badeq.go:8) MOVBQZX 1(CX), BX 0x0046 00070 (badeq.go:8) NOP 0x0046 00070 (badeq.go:8) MOVBQZX 1(AX), BP 0x004a 00074 (badeq.go:8) CMPB BL, BPB 0x004d 00077 (badeq.go:8) SETEQ "".~r1+24(FP) 0x0052 00082 (badeq.go:8) ADDQ $8, SP 0x0056 00086 (badeq.go:8) RET 0x0057 00087 (badeq.go:8) MOVB $0, "".~r1+24(FP) 0x005c 00092 (badeq.go:8) JMP 82 Old backend after: "".g t=1 size=64 args=0x10 locals=0x0 0x0000 00000 (badeq.go:7) TEXT "".g(SB), $0-16 0x0000 00000 (badeq.go:7) NOP 0x0000 00000 (badeq.go:7) NOP 0x0000 00000 (badeq.go:7) FUNCDATA $0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB) 0x0000 00000 (badeq.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (badeq.go:8) NOP 0x0000 00000 (badeq.go:8) MOVBQZX "".a+8(FP), BX 0x0005 00005 (badeq.go:8) MOVQ BX, BP 0x0008 00008 (badeq.go:8) MOVBQZX "".a+9(FP), BX 0x000d 00013 (badeq.go:8) MOVQ BX, DX 0x0010 00016 (badeq.go:8) NOP 0x0010 00016 (badeq.go:8) MOVQ $0, BX 0x0012 00018 (badeq.go:8) NOP 0x0012 00018 (badeq.go:8) MOVQ $1, CX 0x0019 00025 (badeq.go:8) MOVQ $2, AX 0x0020 00032 (badeq.go:8) CMPB BPB, CL 0x0023 00035 (badeq.go:8) JNE 45 0x0025 00037 (badeq.go:8) CMPB DL, AL 0x0027 00039 (badeq.go:8) SETEQ "".~r1+16(FP) 0x002c 00044 (badeq.go:8) RET 0x002d 00045 (badeq.go:8) MOVB $0, "".~r1+16(FP) 0x0032 00050 (badeq.go:8) JMP 44 Change-Id: I120185d58012b7bbcdb1ec01225b5b08d0855d86
This CL reworks walkcompare for clarity and concision. It also makes one significant functional change. (The functional change is hard to separate cleanly from the cleanup, so I just did them together.) When inlining and unrolling an equality comparison for a small struct or array, compare the elements like: a[0] == b[0] && a[1] == b[1] rather than pa := &a pb := &b pa[0] == pb[0] && pa[1] == pb[1] The result is the same, but taking the address and working through the indirect forces the backends to generate less efficient code. This is only an improvement with the SSA backend. However, every port but s390x now has a working SSA backend, and switching to the SSA backend by default everywhere is a priority for Go 1.8. It thus seems reasonable to start to prioritize SSA performance over the old backend. Updates #15303 Sample code: type T struct { a, b int8 } func g(a T) bool { return a == T{1, 2} } SSA before: "".g t=1 size=80 args=0x10 locals=0x8 0x0000 00000 (badeq.go:7) TEXT "".g(SB), $8-16 0x0000 00000 (badeq.go:7) SUBQ $8, SP 0x0004 00004 (badeq.go:7) FUNCDATA $0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB) 0x0004 00004 (badeq.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0004 00004 (badeq.go:8) MOVBLZX "".a+16(FP), AX 0x0009 00009 (badeq.go:8) MOVB AL, "".autotmp_0+6(SP) 0x000d 00013 (badeq.go:8) MOVBLZX "".a+17(FP), AX 0x0012 00018 (badeq.go:8) MOVB AL, "".autotmp_0+7(SP) 0x0016 00022 (badeq.go:8) MOVB $0, "".autotmp_1+4(SP) 0x001b 00027 (badeq.go:8) MOVB $1, "".autotmp_1+4(SP) 0x0020 00032 (badeq.go:8) MOVB $2, "".autotmp_1+5(SP) 0x0025 00037 (badeq.go:8) MOVBLZX "".autotmp_0+6(SP), AX 0x002a 00042 (badeq.go:8) MOVBLZX "".autotmp_1+4(SP), CX 0x002f 00047 (badeq.go:8) CMPB AL, CL 0x0031 00049 (badeq.go:8) JNE 70 0x0033 00051 (badeq.go:8) MOVBLZX "".autotmp_0+7(SP), AX 0x0038 00056 (badeq.go:8) CMPB AL, $2 0x003a 00058 (badeq.go:8) SETEQ AL 0x003d 00061 (badeq.go:8) MOVB AL, "".~r1+24(FP) 0x0041 00065 (badeq.go:8) ADDQ $8, SP 0x0045 00069 (badeq.go:8) RET 0x0046 00070 (badeq.go:8) MOVB $0, AL 0x0048 00072 (badeq.go:8) JMP 61 SSA after: "".g t=1 size=32 args=0x10 locals=0x0 0x0000 00000 (badeq.go:7) TEXT "".g(SB), $0-16 0x0000 00000 (badeq.go:7) NOP 0x0000 00000 (badeq.go:7) NOP 0x0000 00000 (badeq.go:7) FUNCDATA $0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB) 0x0000 00000 (badeq.go:7) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (badeq.go:8) MOVBLZX "".a+8(FP), AX 0x0005 00005 (badeq.go:8) CMPB AL, $1 0x0007 00007 (badeq.go:8) JNE 25 0x0009 00009 (badeq.go:8) MOVBLZX "".a+9(FP), CX 0x000e 00014 (badeq.go:8) CMPB CL, $2 0x0011 00017 (badeq.go:8) SETEQ AL 0x0014 00020 (badeq.go:8) MOVB AL, "".~r1+16(FP) 0x0018 00024 (badeq.go:8) RET 0x0019 00025 (badeq.go:8) MOVB $0, AL 0x001b 00027 (badeq.go:8) JMP 20 Change-Id: I120185d58012b7bbcdb1ec01225b5b08d0855d86 Reviewed-on: https://go-review.googlesource.com/22277 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
This seems fixed. If there are any outstanding remnants (unnecessary zeroing?) you can open a different issue. |
[partial move from #15164]
With ssa, this generates the following code:
This is better than the old backend, but still way too much. Some of the fat is due to #15300, but there's still an unnecessary zero at 0x0016 and unnecessary spills.
The unnecessary zero is introduced during the decompose user phase.
The unnecessary spills are probably due to how the front end does small struct equality, but I haven't double-checked.
cc @mdempsky @brtzsnr @randall77
The text was updated successfully, but these errors were encountered: